Files
histsdk/docs/plans/grpc-tooling-completion.md
T
Joseph Doherty b3417c2f6a docs(grpc): record DelTep multiplexed-channel probe as disproven
README transport matrix + grpc-tooling-completion.md §Out-of-scope: the gRPC
multiplexed-channel hypothesis for DeleteTagExtendedProperties was probed live
2026-06-22 and disproven — primes succeed on the shared channel but DelTep is
still rejected (native code=1), property survives. Stays server-blocked on both
transports, not shipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-22 06:55:05 -04:00

13 KiB
Raw Blame History

gRPC Tooling Completion Plan

Status as of 2026-06-22. Tracks the remaining work to finish tooling the AVEVA Historian SDK's RemoteGrpc (2023 R2) transport so it reaches WCF surface parity. Self-contained for pickup after context compaction.

Where things stand

The gRPC transport already tools: probe, raw/aggregate/at-time reads, browse, metadata, system-parameter, server time-zone, measured store-forward status, AddHistoricalValues backfill write, and (newest, branch grpc-config-ops, 3 commits, NOT yet merged — main = 035d8a9):

  • GetRuntimeParameterAsync live-verified
  • GetTagExtendedPropertiesAsync (read) — live-verified
  • ExecuteSqlCommandAsync server-walled, bounded behind ProtocolEvidenceMissingException
  • EnsureTag / DeleteTag / RenameTags / AddTagExtendedProperties🧪 tooled + routed, sandbox-gated, not yet run destructively live
  • ReadEventsAsync⚠️ tooled + routed 2026-06-22 (item #2 below): chain runs, StartEventQuery succeeds, but GetNextEventQueryResultBuffer long-polls on no data; hard-bounded (≤30s) and throws ProtocolEvidenceMissingException on the no-row path. Row retrieval pending an event-bearing server.

Test baseline: 317 offline green, 19 gRPC-live green. Relevant memory: project_grpc_config_ops_tooling, project_m0_grpc_parity, project_roadmap_exhausted_2020wcf, reference_2023r2_live_server_access, reference_wonder_sql_vd03_credentials.

Proven pattern (reuse for everything below)

A WCF config op is tooled over gRPC by reusing its existing byte serializer/parser verbatim inside the protobuf bytes fields, keyed by the Open2 session handle:

  • HistorianGrpcConnection connection = HistorianGrpcChannelFactory.Create(options);
  • HistorianGrpcHandshake.Session session = HistorianGrpcHandshake.OpenSession(connection, options, ct[, connectionMode]);
    • session.StringHandle = uppercase Open2 GUID → string-handle ops (Retrieval/Status/History string-handle RPCs).
    • session.ClientHandle = transient uintuint-handle ops (StartQuery, DeleteTags, GetNext*).
    • write ops pass connectionMode: HistorianWcfAuthChainHelper.NativeIntegratedWriteEnabledConnectionMode (0x401).
  • Call new <Service>.<Service>Client(connection.Channel).<Rpc>(request, connection.Metadata, DateTime.UtcNow.Add(options.RequestTimeout), ct).
  • Check response.Status?.BSuccess; decode error via response.Status?.BtError (hex = native byte0 0x84 + LE u32 code, often followed by facility/file/message ASCII — this decode cracked the SQL + extended-prop cases).
  • The gRPC RetrievalService string-handle ops do NOT need the WCF Retr.GetV prime.

Proto field-name reference and WCF serializer signatures: see the mapping captured in project_grpc_config_ops_tooling memory and Grpc/Protos/*.proto.

Remaining items (priority order)

1. Live-verify the write ops — DONE 2026-06-22

Outcome: ran the gated lifecycle against a synthetic sandbox tag (ZZ_SdkGrpcWriteProbe); the writes flip 🧪. EnsureTags (create), AddTagExtendedProperties, StartJob rename, and DeleteTags all succeed live over gRPC (write-enabled 0x401 session, WCF serializers reused) — NO priming discovery-dance needed. Two findings: (a) rename is an async StartJob that the server can transiently reject right after the create commits and on target-name collision — the test now pre-cleans both names and retries rename (4×); callers should likewise retry. (b) reading a written extended property back via GetTagExtendedPropertiesAsync hits a shared-parser evidence gap (value marker 0x01 where the parser expects compact-string 0x09) — a read-side gap, not a write failure; the test tolerates it. Lifecycle test is self-cleaning and best-effort cleans up (rename is async + the browse/metadata view is eventually consistent, so a hard absence assert would be racy). Read-side follow-up DONE 2026-06-22: captured the live GetTagExtendedPropertiesFromName bytes and fixed the parser — the response is one group per property (tag name repeats) with a uint16 searchability-flags trailer per property (e.g. 0x0003 built-in, 0x0001 user-added), NOT the 1-byte group trailer the old model assumed (which drifted one byte per group → 0x09-vs-0x01). A written prop now round-trips end-to-end live; golden multi-group test added.

Original notes:

  • Goal: flip the 🧪 writes to by running the gated lifecycle test against a sandbox tag.
  • How: set HISTORIAN_GRPC_WRITE_SANDBOX_TAG to a throwaway name and run TagWriteLifecycle_OverGrpc_CreatesAddsPropRenamesDeletes against the live 2023 R2 box.
  • Risk/gotcha: if any write is rejected, the first fix is to add the WCF write priming discovery-dance (HistorianWcfTagWriteOrchestrator.RunWritePriming: UpdC3 + 6 GetSystemParameter + AllowRenameTags + Trx/Stat/Retr GetV) to HistorianGrpcTagWriteOrchestrator over the gRPC StatusService/HistoryService. Rename also needs server AllowRenameTags enabled. Needs explicit user OK to mutate the shared server (they previously chose "no live mutate").
  • Files: tests/.../HistorianGrpcIntegrationTests.cs (run only), src/.../Grpc/HistorianGrpcTagWriteOrchestrator.cs (priming only if rejected).

2. ReadEvents over gRPC (heaviest read op) — TOOLED 2026-06-22 (rows pending event-bearing server)

Outcome: ReadEventsAsync is routed over gRPC (HistorianGrpcEventOrchestrator). The CM_EVENT registration replay (UpdateClientStatus→6 GetSystemParameterRegisterTags→cross-service version probes→EnsureTags, captured buffers shared with WCF via HistorianEventRegistrationProtocol) runs and StartEventQuery succeeds live. The blocker that remains is server behavior, not the port: GetNextEventQueryResultBuffer long-polls when the query has no rows — it blocks to the call deadline instead of returning the synchronous 5-byte type=4 code=85 terminal the 2020 WCF op returns. Per-call gRPC-Web deadlines proved unreliable over the tunnel (a 4s-deadline chain still ran >90s), so the read is hard-bounded by an overall linked-CTS budget (≤30s, scaled to RequestTimeout); gRPC honors token cancellation. On the no-row path the orchestrator throws ProtocolEvidenceMissingException rather than assert a false-empty list. The idle dev box holds no events, so row-level retrieval is not yet live-verified — flip the gated test ReadEventsAsync_OverGrpc_StartsQueryButRowRetrievalIsLongPollBlocked to assert parsed rows once an event-bearing 2023 R2 server is available (and consider whether the long-poll needs a "fetch historical then stop" request flag the native client may set). README row is ⚠️.

Original notes (still the reference for the registration replay):

  • Goal: route ReadEventsAsync over gRPC.
  • RPCs (exist): RetrievalService.StartEventQuery (uiHandle, uiQueryRequestType, btRequest) → {Status, uiQueryHandle, btResonse}; GetNextEventQueryResultBuffer (uiHandle, uiQueryHandle) → {Status, btResult}; EndEventQuery.
  • Reuse: HistorianEventQueryProtocol.CreateStartEventQueryAttempts(...) for the request buffer (QueryRequestTypeEvent), HistorianEventRowProtocol.Parse(...) for rows.
  • The hard part — port the CM_EVENT registration state machine. Without it, GetNextEventQueryResultBuffer returns native error type=4 code=85. WCF does this in HistorianWcfEventOrchestrator.AddCmEventTagViaAddT: UpdC3 → 6 system params → RegisterTags2 (CM_EVENT tag id 353b8145-5df0-4d46-a253-871aef49b321, 24-byte RTag2 buffer) → cross-service GetVEnsureTags2 (CM_EVENT CTagMetadata via HistorianAddTagsProtocol.SerializeCmEventCTagMetadata). gRPC equivalents: HistoryService.RegisterTags, HistoryService.EnsureTags, HistoryService.UpdateClientStatus, StatusService.GetSystemParameter.
  • Approach: new Grpc/HistorianGrpcEventOrchestrator. Open a read-only session, replay the registration over gRPC (RegisterTags + EnsureTags + the discovery calls), then run StartEventQuery → loop GetNextEventQueryResultBuffer → EndEventQuery, parsing rows. Route in Historian2020ProtocolDialect.ReadEventsAsync on UseGrpc.
  • Verify: live (read-only, safe) against the 2023 R2 box; dev box may return no rows (env) — assert "no error 85 + chain completes," mirror the WCF event test.
  • Risk: medium-high. Registration may need exact call ordering; capture the error buffer (hex+ASCII) at each step if code 85 persists.

3. SendEvent over gRPC

  • Goal: route SendEventAsync over gRPC.
  • Blocker: no distinct event-send RPC; WCF rides AddStreamValues2 (the HistorianEventWriteProtocol.SerializeAddStreamValuesBuffer VTQ). The gRPC framing is uncaptured — needs a native-client gRPC capture before implementing (per "capture first, never guess"). Depends on #2 (same CM_EVENT registration).
  • Risk: high / blocked on capture. Lowest priority.

4. (Stretch) SQL server-wall investigation — RegisterTags prime does NOT help (2026-06-22)

  • ExecuteSqlCommand over gRPC faults server-side in CSrvDbConnection.ExecuteSqlCommand (IndexOutOfRange / native err 38). Tried the HistoryService.RegisterTags-family prime before ExecuteSqlCommand on both read-only (0x402) and write-enabled (0x401) sessions: it does not clear the wall — RegisterTags itself returned false and ExecuteSqlCommand faulted with the identical native-38 error (decoded buffer: ...CSrvDbConnection.ExecuteSqlCommand ... System.IndexOutOfRangeException). So unlike OpenStorageConnection, the SQL DB-connection context is NOT established by the RegisterTags family. The op stays bounded behind ProtocolEvidenceMissingException; use WCF for SQL. Remaining avenues are deeper (reproduce the server-side DB connection-string/index setup the native client triggers) — low priority.

5. GetConnectionStatus over gRPC — DONE 2026-06-22

  • HistorianGrpcStatusClient.GetConnectionStatusAsync synthesizes the status from a measured gRPC handshake (OpenConnection yielding a storage-session GUID ⇒ connected), mirroring the WCF synthesize-from-probe approach. Routed in Historian2020ProtocolDialect on UseGrpc (the WCF path used the MDAS binding, which can't reach the gRPC port). Live-verified; store-forward connectivity stays false (D2-gated). Gated test GetConnectionStatusAsync_OverGrpc_ReportsConnected.

Out of scope

  • ReadBlocks (StartBlockRetrievalQuery) — never captured on either transport; leave throwing ProtocolEvidenceMissingException.
  • DeleteTagExtendedProperties PROBED 2026-06-22, multiplexed-channel hypothesis DISPROVEN. The WCF block (server resolves the property from a per-connection working set the SDK's separate per-service channels can't populate) is NOT lifted by gRPC. The probe (HistorianGrpcTagWriteOrchestrator.ProbeDeleteTagExtendedPropertiesAsync) runs the native GetTgByNmGetTepByNmDelTep sequence over ONE write-enabled (0x401) session on gRPC's single shared channel. Live against the 2023 R2 server (History iface 12): both primes succeed on the shared channel (TgPrimeBytes=98, TepPrimePages=1) yet DelTep is still rejected with native code=1 (the 5-byte error buffer's byte0=132 is the universal 0x84 marker, not a code) and the property survives. Conclusion: the working set the server consults is populated by something the SDK can't reproduce even over one connection — most likely the native client's in-process registration object, not the wire session. Stays server-blocked on BOTH transports; not shipped publicly. Pinned by the gated negative test DeleteTagExtendedProperties_OverGrpc_ProbeMultiplexedChannel (flips if a future server/registration lifts the wall).

Live verification setup (every live run)

Tunnel to WONDER-SQL-VD03 must be up (gRPC localhost:32565, TLS, cert CN WONDER-SQL-VD03; hosts entry present). Creds in gitignored wonder-sql-vd03.txt (QUOTED, colon-delimited — strip quotes; use the domainusername/domainpassword NAM domain account, which works for Historian gRPC; wonderapp does NOT). Env:

HISTORIAN_GRPC_HOST=wonder-sql-vd03  HISTORIAN_GRPC_PORT=32565
HISTORIAN_GRPC_TLS=true  HISTORIAN_GRPC_DNSID=WONDER-SQL-VD03
HISTORIAN_USER=<domain user>  HISTORIAN_PASSWORD=<domain pass>
HISTORIAN_TEST_TAG=SysTimeSec
# writes only, destructive: HISTORIAN_GRPC_WRITE_SANDBOX_TAG=<throwaway>
# slow links: HISTORIAN_GRPC_TIMEOUT=120

Run a subset: dotnet test ./Histsdk.slnx --no-build --filter "FullyQualifiedName~<name>". Aggregate tests self-calibrate their window from a real raw sample (the box is idle/ not-collecting). Sanitization scan before any commit: wonder-sql-vd03|zimmer|nam\\|dohertj2|ADOBuild over commit-safe files.

Standing constraints

  • Never commit credentials/hostnames/customer tag names/raw captures — placeholders only.
  • src/ stays pure managed .NET 10 (one allowed P/Invoke: SSPI). Never modify current/ or aveva-install-*/.
  • Commit only when asked; branch first if on main; required footers (Co-Authored-By + Claude-Session). Capture wire bytes before implementing — never guess.