R4.3: gRPC store-forward status probe + re-scope

Add HistorianGrpcStoreForwardStatusProbe and the `grpc-sf-status-probe` CLI
command. The idle-baseline run against the live 2023 R2 server resolves the
plan's §9.3 handle question: the direct StorageService SF pull RPCs
(GetSFParameter / GetRemainingSnapshotsSize) require the OpenStorageConnection
console handle and are D2-gated (err 132, identical under read-only and
write-enabled sessions), while StatusService.GetHistorianConsoleStatus IS
reachable on the session string handle (=3 at idle).

Records the gRPC re-scope and the idle-baseline findings in
docs/plans/store-forward-cache-reverse-engineering.md §9. The probe writes
nothing and releases any console session immediately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
This commit is contained in:
Joseph Doherty
2026-06-21 23:14:05 -04:00
parent f840af5873
commit c2d8fb9bc8
3 changed files with 619 additions and 1 deletions
@@ -1,6 +1,18 @@
# Store/Forward Cache Reverse-Engineering Plan
Last updated: 2026-05-04
Last updated: 2026-06-21
> **2026-06-21 R4.3 re-scope — read this first.** The original plan below
> (2026-05-04) was written against the 2020 Net.TCP/WCF transport, before the
> 2023 R2 gRPC transport existed. Its single biggest open risk — *"is SF state
> readable via a one-shot pull, or only via a duplex push contract we'd have to
> add?"* (Q1/Q2 + §3 Step 3 + Risk 4) — is now **answered: pull, no duplex**.
> The recovered gRPC `StorageService` contract exposes SF state as plain
> request/response RPCs. The current R4.3 scope and recommended path are in
> §9 ("2026-06-21 gRPC re-scope"); the 2020-WCF body below is retained as
> background, not the recommended route.
Original last-updated: 2026-05-04
This document plans the reverse-engineering effort needed to replace the
synthesized `GetStoreForwardStatusAsync` in
@@ -499,3 +511,202 @@ Explicitly not part of this plan:
- Anything in the
`aahClientCommon.CSFConnection.StartStoreforward` /
`SetStorageStopped` / `SetTagSynchronized` write surface.
## 9. 2026-06-21 gRPC re-scope (current R4.3 plan)
This supersedes the recommended *route* in §2/§3/§4. The deliverable
(§1) and success criteria (§6) are unchanged. What changed is the
transport and the resolved architecture risk.
### 9.1 What the recovered gRPC contract already gives us
The 2023 R2 contract under `src/AVEVA.Historian.Client/Grpc/Protos/`
exposes SF state through **first-class pull RPCs** on `StorageService`
(`StorageService.proto`) — no duplex/callback contract, no native
`HISTORIAN_STORAGE_STATUS` C-struct decode:
- `GetSFParameter(uint32 Handle, string ParameterName)
→ (Status status, string ParamaterValue)` — the direct analogue of the
already-shipped `GetSystemParameter`/`GetRuntimeParameter` string-keyed
pulls. This is the primary SF-state lever: a name→value read.
- `GetRemainingSnapshotsSize(uint32 Handle)
→ (Status status, uint64 SnapshotSize)` — the pending-buffer magnitude
in one call. Non-zero ⇒ data is queued (`Pending`/`DataStored=true`);
zero ⇒ drained. The cleanest single signal for the idle-vs-active split.
- `GetInfo(string Request) → (Status status, bytes info)` — generic
server info blob; a fallback if a named SF key lives here instead of in
`GetSFParameter`.
- `OpenStorageConnectionResponse.ServerStatus` (field 5) and the
`GetSnapshots`/`StartQuerySnapshot` family — secondary signals.
`SetSFParameter` exists too but is **out of scope** (read-only mission, §8).
The `TransactionService.ForwardSnapshot{,Begin,End}` RPCs are the SF
cache *replay/transfer* path (write-side), **not** a status read — also
out of scope here; they belong to the deferred bit-faithful SF cache work,
not to `GetStoreForwardStatusAsync`.
### 9.2 Plumbing that already exists (reuse, don't rebuild)
- `HistorianGrpcHandshake.OpenSession` — authenticated gRPC session
(`ValidateClientCredential` NTLM loop + Open2) yielding `ClientHandle`
(uint) + storage-session GUID. Live-verified against the 2023 R2 box.
- `HistorianGrpcStorageConnectionProbe` — already constructs a
`StorageService.StorageServiceClient`, primes `GetInterfaceVersion`, and
calls `OpenStorageConnection`/`CloseStorageConnection`. The SF-status
probe is a near-clone that swaps the `OpenStorageConnection` body for
`GetSFParameter`/`GetRemainingSnapshotsSize` calls.
- `HistorianGrpcChannelFactory` / `HistorianGrpcConnection` — channel,
metadata, deadlines.
### 9.3 The one open risk that survives: which `Handle`?
`GetSFParameter`/`GetRemainingSnapshotsSize` both take `uint32 Handle`.
Unknown: do they accept the **session `ClientHandle`** (from
`OpenSession`, which is cheap and unblocked), or do they require the
**storage console `Handle`** returned by `OpenStorageConnection` — which
is the D2 wall (`OpenStorageConnection` routes to the
`\\.\pipe\aahStorageEngine\console` session and is the same storage-engine
pipe that blocks revision writes)? See
[[project_roadmap_exhausted_2020wcf]] and `HistorianGrpcStorageConnectionProbe`
header.
- **Best case:** these read-only status RPCs accept the session
`ClientHandle` (status reads shouldn't need a console writer session).
Then R4.3-over-gRPC is unblocked end-to-end and is a small, shippable
feature.
- **Worst case:** they require the `OpenStorageConnection` `Handle` ⇒
R4.3 inherits the D2 storage-engine-pipe wall and stays blocked on the
same root cause as R4.2. Either way the probe answers it in one run.
### 9.4 Discovery steps (execution order)
1. **Add `grpc-sf-status-probe` to `tools/AVEVA.Historian.ReverseEngineering`**
(mirror `HistorianGrpcStorageConnectionProbe`). Against the live 2023 R2
server it:
- opens an authenticated session, gets `ClientHandle`;
- calls `GetRemainingSnapshotsSize(ClientHandle)` and reports
`status.bSuccess` + `SnapshotSize` + any error buffer;
- sweeps `GetSFParameter(ClientHandle, name)` over a candidate
name list (`Status`, `Storing`, `Pending`, `DataStored`,
`SF.Status`, `StoreForwardStatus`, `Forward`, `CacheSize`,
`ErrorOccurred`, plus any names surfaced by Workstream A's IL of
`ConvertUnmanagedSFStorageStatusToManagedStorageStatus`);
- records which names the server accepts and the returned values.
- If every call fails with an auth/handle-shaped error, retry once
with the `OpenStorageConnection` `Handle` to disambiguate §9.3.
2. **Idle baseline first** — run against the server with SF *not* active.
Establishes the "no SF / drained" response shape (expected:
`SnapshotSize=0`, parameter reads succeed-with-defaults or
return a "not configured" sentinel). This alone may be enough to ship
an honest idle-state implementation that is strictly better than
today's hardcoded all-false synthesis (it would be *measured* false).
3. **Active-SF capture** — only if step 2 proves the read works and we
need the active-state fixtures. Force SF on the sacrificial Historian
VM (stop Runtime DB writer; let the queue spill to SF), re-run the
probe, capture the non-zero/`Storing=true` response. This is the one
invasive step and the gate on full success criteria §6.16.3.
4. **Map + implement** — add `GrpcGetStoreForwardStatus` to the gRPC
read orchestrator, map the probed fields onto
`HistorianStoreForwardStatus`, route `GetStoreForwardStatusAsync`
to it when `Transport == RemoteGrpc` (keep the synthesized fallback
for non-gRPC transports and for the "no SF configured" sentinel).
Add golden-byte fixtures (idle + active) and
`WcfStoreForwardStatusProtocolTests`-style parse tests. Gate the live
integration test on `HISTORIAN_GRPC_HOST`.
### 9.5 Effort / feasibility summary
- **Risk collapsed:** pull-vs-push (the old plan's worst risk) is settled
— it's a pull. No duplex WCF/gRPC callback contract.
- **No native struct decode:** `GetSFParameter` returns a *string*; we
skip the `HISTORIAN_STORAGE_STATUS` C-layout RE entirely (Workstream
A.2 / D.1 become "nice-to-have for field names", not blocking).
- **Reuses shipped plumbing:** session open + `StorageServiceClient` +
channel already exist and are live-verified.
- **Remaining unknowns are empirical, one probe-run each:** (a) the
accepted parameter-name vocabulary, (b) which `Handle` the status RPCs
want (§9.3 — the only thing that could re-block it), (c) the
active-SF response shape (needs the invasive force-SF step).
- **Net:** Step 12 are low-risk and could land a *measured* idle-state
`GetStoreForwardStatusAsync` over gRPC quickly. Steps 34 (full
success criteria) still need the sacrificial-VM force-SF capture and
are gated on §9.3 not landing on the D2 wall.
### 9.6 Out of scope (unchanged from §8, restated for gRPC)
`SetSFParameter`, `ForwardSnapshot*` (SF replay/transfer), the on-disk
cache file format, and redundant-partner SF aggregation all remain out of
scope. R4.3 is read-only status, gRPC-first.
### 9.7 Idle-baseline run — RESULTS (2026-06-21)
Built `HistorianGrpcStoreForwardStatusProbe` + the `grpc-sf-status-probe`
CLI command and ran it against the **live 2023 R2 server** with the
historian in its **idle / not-actively-storing** state (storage interface
v4, authenticated session opened OK). Tested both read-only (`0x402`) and
write-enabled (`0x401`) sessions. Findings, with the §9.3 handle question
**resolved**:
1. **Direct `StorageService` SF pull RPCs are D2-gated — confirmed the
§9.3 worst-case branch.**
- `GetRemainingSnapshotsSize(session.ClientHandle)` →
`bSuccess=false`, error buffer `04 84 00 00 00` (= status `0x84` /
**132 `OperationNotEnabled`**). **Identical under `0x401` and
`0x402`** — so it is NOT the read/write connection-mode gate; the
History-session `ClientHandle` is simply not a valid handle for this
op's handle-space.
- `GetSFParameter(session.ClientHandle, <name>)` → server-side
`RpcException(Unknown, "Exception was thrown by handler")` for **all
16** candidate names, both session modes.
- These two ops need the **`OpenStorageConnection` console handle**,
and `OpenStorageConnection` itself fails with the storage-engine
console error (`84 55 00 00 00 01 02 00 09 15 00`
+ ASCII `"OpenStorageConnection"`) — the **D2 storage-engine-pipe
wall**, the same root cause that blocks R4.2 revision writes. We
cannot obtain the console handle, so these two SF RPCs are
unreachable from a pure managed client. See
[[project_roadmap_exhausted_2020wcf]].
2. **One reachable session-handle lever found:**
`StatusService.GetHistorianConsoleStatus(strHandle)` **SUCCEEDS** with
the session string handle (uppercase Open2 GUID) — no console handle
needed — and returns `uiConsoleStatus = 3` at idle. This is the only
SF-adjacent signal reachable from the managed client. **Its enum
semantics are unknown** (3 = presumably "running/normal"); whether it
shifts when SF is actively storing is the open question.
3. `StatusService.GetHistorianInfo(strHandle, btRequest)` → `bSuccess=
false` for every `btRequest` candidate (empty / `u32(0)` / ascii+utf16
`"StoreForward"`); its request framing is not yet known. Lower-yield
than `GetHistorianConsoleStatus`; revisit only if needed.
**Net idle-baseline conclusion.** R4.3's clean direct route
(`GetSFParameter` / `GetRemainingSnapshotsSize`) is **blocked behind the
D2 storage-engine console pipe**, exactly like R4.2 — a pure managed
client cannot open the console session those ops require. The *only*
reachable SF-adjacent signal is `GetHistorianConsoleStatus` → a status
uint. Two paths forward:
- **(a) Ship a measured idle-state only. — SHIPPED + LIVE-VERIFIED 2026-06-21.**
`HistorianGrpcStatusClient.GetStoreForwardStatusAsync` opens a session,
calls `GetHistorianConsoleStatus`, and returns
`HistorianStoreForwardStatus` all-false but *measured*: it actually
contacts the server and reports `ErrorOccurred=true` (with the underlying
error) when the server is unreachable / the console-status call fails —
strictly better than the blind hardcoded synthesis, which never contacts
the server. Routed via `Historian2020ProtocolDialect.GetStoreForwardStatusAsync`
when `Transport == RemoteGrpc` (non-gRPC keeps the synthesized fallback).
Gated live test `HistorianGrpcIntegrationTests.GetStoreForwardStatusAsync_OverGrpc_ReturnsMeasuredIdleState`
passes against the real 2023 R2 server. `Storing`/`Pending`/`DataStored`
magnitude is intentionally NOT surfaced — it lives behind the D2 wall (see
path (b)).
- **(b) Full success criteria (§6) stay blocked** on the D2 console-pipe
wall. Decoding the active-SF `uiConsoleStatus` value and any
`GetSystemParameter` SF keys still needs the invasive force-SF capture
on a sacrificial Historian — and even then `Storing`/`DataStored`
magnitude is only available via the D2-gated `GetRemainingSnapshotsSize`.
Probe code: `src/AVEVA.Historian.Client/Grpc/HistorianGrpcStoreForwardStatusProbe.cs`,
CLI `grpc-sf-status-probe <host> [port] [--tls] [--dnsid <n>] [--write-session]`.
Writes nothing; releases any console session immediately.