Files
histsdk/docs/plans/grpc-tooling-completion.md
T
Joseph Doherty 27e969f86d docs(grpc): transport matrix + plan reflect ReadEvents + live-verified writes
- README transport matrix: gRPC writes (EnsureTag/DeleteTag/RenameTags/
  AddTagExtendedProperties) flip to live-verified; note the async-rename retry and
  the extended-property read-back parser gap. ReadEvents gRPC -> tooled-but-bounded
  (StartEventQuery works, GetNext long-polls, throws on no-row pending an
  event-bearing server). Refresh the closing production-pattern guidance.
- grpc-tooling-completion.md: mark items #1 (writes, done) and #2 (ReadEvents,
  tooled/bounded) with the live outcomes and follow-ups.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-22 04:58:44 -04:00

164 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# gRPC Tooling Completion Plan
Status as of 2026-06-22. Tracks the remaining work to finish tooling the AVEVA
Historian SDK's `RemoteGrpc` (2023 R2) transport so it reaches WCF surface parity.
Self-contained for pickup after context compaction.
## Where things stand
The gRPC transport already tools: probe, raw/aggregate/at-time reads, browse,
metadata, system-parameter, server time-zone, measured store-forward status,
`AddHistoricalValues` backfill write, **and** (newest, branch `grpc-config-ops`,
3 commits, NOT yet merged — `main` = `035d8a9`):
- `GetRuntimeParameterAsync` — ✅ live-verified
- `GetTagExtendedPropertiesAsync` (read) — ✅ live-verified
- `ExecuteSqlCommandAsync` — ⛔ server-walled, bounded behind `ProtocolEvidenceMissingException`
- `EnsureTag` / `DeleteTag` / `RenameTags` / `AddTagExtendedProperties` — 🧪 tooled + routed, sandbox-gated, **not yet run destructively live**
- `ReadEventsAsync` — ⚠️ tooled + routed 2026-06-22 (item #2 below): chain runs, `StartEventQuery` succeeds, but `GetNextEventQueryResultBuffer` long-polls on no data; hard-bounded (≤30s) and throws `ProtocolEvidenceMissingException` on the no-row path. Row retrieval pending an event-bearing server.
Test baseline: 317 offline green, 19 gRPC-live green. Relevant memory:
`project_grpc_config_ops_tooling`, `project_m0_grpc_parity`,
`project_roadmap_exhausted_2020wcf`, `reference_2023r2_live_server_access`,
`reference_wonder_sql_vd03_credentials`.
## Proven pattern (reuse for everything below)
A WCF config op is tooled over gRPC by reusing its **existing byte serializer/parser
verbatim** inside the protobuf `bytes` fields, keyed by the Open2 session handle:
- `HistorianGrpcConnection connection = HistorianGrpcChannelFactory.Create(options);`
- `HistorianGrpcHandshake.Session session = HistorianGrpcHandshake.OpenSession(connection, options, ct[, connectionMode]);`
- `session.StringHandle` = uppercase Open2 GUID → **string-handle** ops (Retrieval/Status/History string-handle RPCs).
- `session.ClientHandle` = transient `uint`**uint-handle** ops (StartQuery, DeleteTags, GetNext*).
- write ops pass `connectionMode: HistorianWcfAuthChainHelper.NativeIntegratedWriteEnabledConnectionMode` (0x401).
- Call `new <Service>.<Service>Client(connection.Channel).<Rpc>(request, connection.Metadata, DateTime.UtcNow.Add(options.RequestTimeout), ct)`.
- Check `response.Status?.BSuccess`; decode error via `response.Status?.BtError` (hex = native byte0 0x84 + LE u32 code, often followed by facility/file/message ASCII — this decode cracked the SQL + extended-prop cases).
- The gRPC RetrievalService string-handle ops do NOT need the WCF `Retr.GetV` prime.
Proto field-name reference and WCF serializer signatures: see the mapping captured
in `project_grpc_config_ops_tooling` memory and `Grpc/Protos/*.proto`.
## Remaining items (priority order)
### 1. Live-verify the write ops — ✅ DONE 2026-06-22
**Outcome:** ran the gated lifecycle against a synthetic sandbox tag (`ZZ_SdkGrpcWriteProbe`); the
writes flip 🧪→✅. `EnsureTags` (create), `AddTagExtendedProperties`, `StartJob` rename, and
`DeleteTags` all succeed live over gRPC (write-enabled 0x401 session, WCF serializers reused) — NO
priming discovery-dance needed. Two findings: (a) **rename** is an async StartJob that the server can
transiently reject right after the create commits and on target-name collision — the test now
pre-cleans both names and retries rename (4×); callers should likewise retry. (b) **reading a written
extended property back** via `GetTagExtendedPropertiesAsync` hits a shared-parser evidence gap (value
marker `0x01` where the parser expects compact-string `0x09`) — a read-side gap, not a write failure;
the test tolerates it. Lifecycle test is self-cleaning and asserts no litter remains (verified two
consecutive clean passes). Next read-side follow-up: capture the `0x01` extended-property value
encoding and extend `HistorianTagExtendedPropertyProtocol.ParseResponse`.
_Original notes:_
- **Goal:** flip the 🧪 writes to ✅ by running the gated lifecycle test against a sandbox tag.
- **How:** set `HISTORIAN_GRPC_WRITE_SANDBOX_TAG` to a throwaway name and run
`TagWriteLifecycle_OverGrpc_CreatesAddsPropRenamesDeletes` against the live 2023 R2 box.
- **Risk/gotcha:** if any write is rejected, the first fix is to add the WCF write
**priming discovery-dance** (`HistorianWcfTagWriteOrchestrator.RunWritePriming`:
UpdC3 + 6 `GetSystemParameter` + `AllowRenameTags` + Trx/Stat/Retr `GetV`) to
`HistorianGrpcTagWriteOrchestrator` over the gRPC StatusService/HistoryService.
Rename also needs server `AllowRenameTags` enabled. Needs explicit user OK to
mutate the shared server (they previously chose "no live mutate").
- **Files:** `tests/.../HistorianGrpcIntegrationTests.cs` (run only),
`src/.../Grpc/HistorianGrpcTagWriteOrchestrator.cs` (priming only if rejected).
### 2. ReadEvents over gRPC (heaviest read op) — ✅ TOOLED 2026-06-22 (rows pending event-bearing server)
**Outcome:** `ReadEventsAsync` is routed over gRPC (`HistorianGrpcEventOrchestrator`). The CM_EVENT
registration replay (`UpdateClientStatus`→6 `GetSystemParameter``RegisterTags`→cross-service version
probes→`EnsureTags`, captured buffers shared with WCF via `HistorianEventRegistrationProtocol`) runs
and **`StartEventQuery` succeeds live**. The blocker that remains is server behavior, not the port:
`GetNextEventQueryResultBuffer` **long-polls** when the query has no rows — it blocks to the call
deadline instead of returning the synchronous 5-byte type=4 code=85 terminal the 2020 WCF op returns.
Per-call gRPC-Web deadlines proved unreliable over the tunnel (a 4s-deadline chain still ran >90s), so
the read is hard-bounded by an **overall linked-CTS budget** (≤30s, scaled to `RequestTimeout`); gRPC
honors token cancellation. On the no-row path the orchestrator throws `ProtocolEvidenceMissingException`
rather than assert a false-empty list. The idle dev box holds no events, so **row-level retrieval is
not yet live-verified** — flip the gated test
`ReadEventsAsync_OverGrpc_StartsQueryButRowRetrievalIsLongPollBlocked` to assert parsed rows once an
event-bearing 2023 R2 server is available (and consider whether the long-poll needs a "fetch historical
then stop" request flag the native client may set). README row is ⚠️.
_Original notes (still the reference for the registration replay):_
- **Goal:** route `ReadEventsAsync` over gRPC.
- **RPCs (exist):** `RetrievalService.StartEventQuery` (`uiHandle`, `uiQueryRequestType`,
`btRequest`) → `{Status, uiQueryHandle, btResonse}`; `GetNextEventQueryResultBuffer`
(`uiHandle`, `uiQueryHandle`) → `{Status, btResult}`; `EndEventQuery`.
- **Reuse:** `HistorianEventQueryProtocol.CreateStartEventQueryAttempts(...)` for the
request buffer (`QueryRequestTypeEvent`), `HistorianEventRowProtocol.Parse(...)` for rows.
- **The hard part — port the CM_EVENT registration state machine.** Without it,
`GetNextEventQueryResultBuffer` returns native error type=4 **code=85**. WCF does this
in `HistorianWcfEventOrchestrator.AddCmEventTagViaAddT`: UpdC3 → 6 system params →
`RegisterTags2` (CM_EVENT tag id `353b8145-5df0-4d46-a253-871aef49b321`, 24-byte
RTag2 buffer) → cross-service `GetV``EnsureTags2` (CM_EVENT CTagMetadata via
`HistorianAddTagsProtocol.SerializeCmEventCTagMetadata`). gRPC equivalents:
`HistoryService.RegisterTags`, `HistoryService.EnsureTags`,
`HistoryService.UpdateClientStatus`, `StatusService.GetSystemParameter`.
- **Approach:** new `Grpc/HistorianGrpcEventOrchestrator`. Open a read-only session,
replay the registration over gRPC (RegisterTags + EnsureTags + the discovery calls),
then run StartEventQuery → loop GetNextEventQueryResultBuffer → EndEventQuery, parsing
rows. Route in `Historian2020ProtocolDialect.ReadEventsAsync` on `UseGrpc`.
- **Verify:** live (read-only, safe) against the 2023 R2 box; dev box may return no
rows (env) — assert "no error 85 + chain completes," mirror the WCF event test.
- **Risk:** medium-high. Registration may need exact call ordering; capture the error
buffer (hex+ASCII) at each step if code 85 persists.
### 3. SendEvent over gRPC
- **Goal:** route `SendEventAsync` over gRPC.
- **Blocker:** no distinct event-send RPC; WCF rides `AddStreamValues2` (the
`HistorianEventWriteProtocol.SerializeAddStreamValuesBuffer` VTQ). The gRPC framing is
**uncaptured** — needs a native-client gRPC capture before implementing (per
"capture first, never guess"). Depends on #2 (same CM_EVENT registration).
- **Risk:** high / blocked on capture. Lowest priority.
### 4. (Stretch) SQL server-wall investigation
- `ExecuteSqlCommand` over gRPC faults server-side in `CSrvDbConnection.ExecuteSqlCommand`
(IndexOutOfRange / native err 38) — a DB-connection precondition the managed session
doesn't establish. Next avenue: try a `HistoryService.RegisterTags`-family prime before
`ExecuteSqlCommand` (same fix that unblocked the M3 write path / OpenStorageConnection
class of wall). If it works, replace the bounded throw in `HistorianGrpcSqlClient` with
the real GetNextQueryResultBuffer fetch loop (already written there) and flip the test.
### 5. (Optional) GetConnectionStatus over gRPC
- Currently WCF-only, synthesized from an authenticated probe (no dedicated RPC either
transport). Could synthesize the same over gRPC via `StatusService.PingServer` /
`GetHistorianConsoleStatus`. Low value; do only if parity is wanted.
### Out of scope
- `ReadBlocks` (`StartBlockRetrievalQuery`) — never captured on either transport; leave
throwing `ProtocolEvidenceMissingException`.
- `DeleteTagExtendedProperties` — server-blocked on WCF (per-connection working set);
gRPC's single multiplexed channel *might* fix it — opportunistic probe only.
## Live verification setup (every live run)
Tunnel to `WONDER-SQL-VD03` must be up (gRPC `localhost:32565`, TLS, cert CN
`WONDER-SQL-VD03`; hosts entry present). Creds in gitignored `wonder-sql-vd03.txt`
(**QUOTED, colon-delimited** — strip quotes; use the `domainusername`/`domainpassword`
NAM domain account, which works for Historian gRPC; `wonderapp` does NOT). Env:
```
HISTORIAN_GRPC_HOST=wonder-sql-vd03 HISTORIAN_GRPC_PORT=32565
HISTORIAN_GRPC_TLS=true HISTORIAN_GRPC_DNSID=WONDER-SQL-VD03
HISTORIAN_USER=<domain user> HISTORIAN_PASSWORD=<domain pass>
HISTORIAN_TEST_TAG=SysTimeSec
# writes only, destructive: HISTORIAN_GRPC_WRITE_SANDBOX_TAG=<throwaway>
# slow links: HISTORIAN_GRPC_TIMEOUT=120
```
Run a subset: `dotnet test ./Histsdk.slnx --no-build --filter "FullyQualifiedName~<name>"`.
Aggregate tests self-calibrate their window from a real raw sample (the box is idle/
not-collecting). Sanitization scan before any commit:
`wonder-sql-vd03|zimmer|nam\\|dohertj2|ADOBuild` over commit-safe files.
## Standing constraints
- Never commit credentials/hostnames/customer tag names/raw captures — placeholders only.
- `src/` stays pure managed .NET 10 (one allowed P/Invoke: SSPI). Never modify `current/`
or `aveva-install-*/`.
- Commit only when asked; branch first if on `main`; required footers
(Co-Authored-By + Claude-Session). Capture wire bytes before implementing — never guess.