R4.3: gRPC store-forward status probe + re-scope
Add HistorianGrpcStoreForwardStatusProbe and the `grpc-sf-status-probe` CLI command. The idle-baseline run against the live 2023 R2 server resolves the plan's §9.3 handle question: the direct StorageService SF pull RPCs (GetSFParameter / GetRemainingSnapshotsSize) require the OpenStorageConnection console handle and are D2-gated (err 132, identical under read-only and write-enabled sessions), while StatusService.GetHistorianConsoleStatus IS reachable on the session string handle (=3 at idle). Records the gRPC re-scope and the idle-baseline findings in docs/plans/store-forward-cache-reverse-engineering.md §9. The probe writes nothing and releases any console session immediately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
This commit is contained in:
@@ -1,6 +1,18 @@
|
||||
# Store/Forward Cache Reverse-Engineering Plan
|
||||
|
||||
Last updated: 2026-05-04
|
||||
Last updated: 2026-06-21
|
||||
|
||||
> **2026-06-21 R4.3 re-scope — read this first.** The original plan below
|
||||
> (2026-05-04) was written against the 2020 Net.TCP/WCF transport, before the
|
||||
> 2023 R2 gRPC transport existed. Its single biggest open risk — *"is SF state
|
||||
> readable via a one-shot pull, or only via a duplex push contract we'd have to
|
||||
> add?"* (Q1/Q2 + §3 Step 3 + Risk 4) — is now **answered: pull, no duplex**.
|
||||
> The recovered gRPC `StorageService` contract exposes SF state as plain
|
||||
> request/response RPCs. The current R4.3 scope and recommended path are in
|
||||
> §9 ("2026-06-21 gRPC re-scope"); the 2020-WCF body below is retained as
|
||||
> background, not the recommended route.
|
||||
|
||||
Original last-updated: 2026-05-04
|
||||
|
||||
This document plans the reverse-engineering effort needed to replace the
|
||||
synthesized `GetStoreForwardStatusAsync` in
|
||||
@@ -499,3 +511,202 @@ Explicitly not part of this plan:
|
||||
- Anything in the
|
||||
`aahClientCommon.CSFConnection.StartStoreforward` /
|
||||
`SetStorageStopped` / `SetTagSynchronized` write surface.
|
||||
|
||||
## 9. 2026-06-21 gRPC re-scope (current R4.3 plan)
|
||||
|
||||
This supersedes the recommended *route* in §2/§3/§4. The deliverable
|
||||
(§1) and success criteria (§6) are unchanged. What changed is the
|
||||
transport and the resolved architecture risk.
|
||||
|
||||
### 9.1 What the recovered gRPC contract already gives us
|
||||
|
||||
The 2023 R2 contract under `src/AVEVA.Historian.Client/Grpc/Protos/`
|
||||
exposes SF state through **first-class pull RPCs** on `StorageService`
|
||||
(`StorageService.proto`) — no duplex/callback contract, no native
|
||||
`HISTORIAN_STORAGE_STATUS` C-struct decode:
|
||||
|
||||
- `GetSFParameter(uint32 Handle, string ParameterName)
|
||||
→ (Status status, string ParamaterValue)` — the direct analogue of the
|
||||
already-shipped `GetSystemParameter`/`GetRuntimeParameter` string-keyed
|
||||
pulls. This is the primary SF-state lever: a name→value read.
|
||||
- `GetRemainingSnapshotsSize(uint32 Handle)
|
||||
→ (Status status, uint64 SnapshotSize)` — the pending-buffer magnitude
|
||||
in one call. Non-zero ⇒ data is queued (`Pending`/`DataStored=true`);
|
||||
zero ⇒ drained. The cleanest single signal for the idle-vs-active split.
|
||||
- `GetInfo(string Request) → (Status status, bytes info)` — generic
|
||||
server info blob; a fallback if a named SF key lives here instead of in
|
||||
`GetSFParameter`.
|
||||
- `OpenStorageConnectionResponse.ServerStatus` (field 5) and the
|
||||
`GetSnapshots`/`StartQuerySnapshot` family — secondary signals.
|
||||
|
||||
`SetSFParameter` exists too but is **out of scope** (read-only mission, §8).
|
||||
|
||||
The `TransactionService.ForwardSnapshot{,Begin,End}` RPCs are the SF
|
||||
cache *replay/transfer* path (write-side), **not** a status read — also
|
||||
out of scope here; they belong to the deferred bit-faithful SF cache work,
|
||||
not to `GetStoreForwardStatusAsync`.
|
||||
|
||||
### 9.2 Plumbing that already exists (reuse, don't rebuild)
|
||||
|
||||
- `HistorianGrpcHandshake.OpenSession` — authenticated gRPC session
|
||||
(`ValidateClientCredential` NTLM loop + Open2) yielding `ClientHandle`
|
||||
(uint) + storage-session GUID. Live-verified against the 2023 R2 box.
|
||||
- `HistorianGrpcStorageConnectionProbe` — already constructs a
|
||||
`StorageService.StorageServiceClient`, primes `GetInterfaceVersion`, and
|
||||
calls `OpenStorageConnection`/`CloseStorageConnection`. The SF-status
|
||||
probe is a near-clone that swaps the `OpenStorageConnection` body for
|
||||
`GetSFParameter`/`GetRemainingSnapshotsSize` calls.
|
||||
- `HistorianGrpcChannelFactory` / `HistorianGrpcConnection` — channel,
|
||||
metadata, deadlines.
|
||||
|
||||
### 9.3 The one open risk that survives: which `Handle`?
|
||||
|
||||
`GetSFParameter`/`GetRemainingSnapshotsSize` both take `uint32 Handle`.
|
||||
Unknown: do they accept the **session `ClientHandle`** (from
|
||||
`OpenSession`, which is cheap and unblocked), or do they require the
|
||||
**storage console `Handle`** returned by `OpenStorageConnection` — which
|
||||
is the D2 wall (`OpenStorageConnection` routes to the
|
||||
`\\.\pipe\aahStorageEngine\console` session and is the same storage-engine
|
||||
pipe that blocks revision writes)? See
|
||||
[[project_roadmap_exhausted_2020wcf]] and `HistorianGrpcStorageConnectionProbe`
|
||||
header.
|
||||
|
||||
- **Best case:** these read-only status RPCs accept the session
|
||||
`ClientHandle` (status reads shouldn't need a console writer session).
|
||||
Then R4.3-over-gRPC is unblocked end-to-end and is a small, shippable
|
||||
feature.
|
||||
- **Worst case:** they require the `OpenStorageConnection` `Handle` ⇒
|
||||
R4.3 inherits the D2 storage-engine-pipe wall and stays blocked on the
|
||||
same root cause as R4.2. Either way the probe answers it in one run.
|
||||
|
||||
### 9.4 Discovery steps (execution order)
|
||||
|
||||
1. **Add `grpc-sf-status-probe` to `tools/AVEVA.Historian.ReverseEngineering`**
|
||||
(mirror `HistorianGrpcStorageConnectionProbe`). Against the live 2023 R2
|
||||
server it:
|
||||
- opens an authenticated session, gets `ClientHandle`;
|
||||
- calls `GetRemainingSnapshotsSize(ClientHandle)` and reports
|
||||
`status.bSuccess` + `SnapshotSize` + any error buffer;
|
||||
- sweeps `GetSFParameter(ClientHandle, name)` over a candidate
|
||||
name list (`Status`, `Storing`, `Pending`, `DataStored`,
|
||||
`SF.Status`, `StoreForwardStatus`, `Forward`, `CacheSize`,
|
||||
`ErrorOccurred`, plus any names surfaced by Workstream A's IL of
|
||||
`ConvertUnmanagedSFStorageStatusToManagedStorageStatus`);
|
||||
- records which names the server accepts and the returned values.
|
||||
- If every call fails with an auth/handle-shaped error, retry once
|
||||
with the `OpenStorageConnection` `Handle` to disambiguate §9.3.
|
||||
2. **Idle baseline first** — run against the server with SF *not* active.
|
||||
Establishes the "no SF / drained" response shape (expected:
|
||||
`SnapshotSize=0`, parameter reads succeed-with-defaults or
|
||||
return a "not configured" sentinel). This alone may be enough to ship
|
||||
an honest idle-state implementation that is strictly better than
|
||||
today's hardcoded all-false synthesis (it would be *measured* false).
|
||||
3. **Active-SF capture** — only if step 2 proves the read works and we
|
||||
need the active-state fixtures. Force SF on the sacrificial Historian
|
||||
VM (stop Runtime DB writer; let the queue spill to SF), re-run the
|
||||
probe, capture the non-zero/`Storing=true` response. This is the one
|
||||
invasive step and the gate on full success criteria §6.1–6.3.
|
||||
4. **Map + implement** — add `GrpcGetStoreForwardStatus` to the gRPC
|
||||
read orchestrator, map the probed fields onto
|
||||
`HistorianStoreForwardStatus`, route `GetStoreForwardStatusAsync`
|
||||
to it when `Transport == RemoteGrpc` (keep the synthesized fallback
|
||||
for non-gRPC transports and for the "no SF configured" sentinel).
|
||||
Add golden-byte fixtures (idle + active) and
|
||||
`WcfStoreForwardStatusProtocolTests`-style parse tests. Gate the live
|
||||
integration test on `HISTORIAN_GRPC_HOST`.
|
||||
|
||||
### 9.5 Effort / feasibility summary
|
||||
|
||||
- **Risk collapsed:** pull-vs-push (the old plan's worst risk) is settled
|
||||
— it's a pull. No duplex WCF/gRPC callback contract.
|
||||
- **No native struct decode:** `GetSFParameter` returns a *string*; we
|
||||
skip the `HISTORIAN_STORAGE_STATUS` C-layout RE entirely (Workstream
|
||||
A.2 / D.1 become "nice-to-have for field names", not blocking).
|
||||
- **Reuses shipped plumbing:** session open + `StorageServiceClient` +
|
||||
channel already exist and are live-verified.
|
||||
- **Remaining unknowns are empirical, one probe-run each:** (a) the
|
||||
accepted parameter-name vocabulary, (b) which `Handle` the status RPCs
|
||||
want (§9.3 — the only thing that could re-block it), (c) the
|
||||
active-SF response shape (needs the invasive force-SF step).
|
||||
- **Net:** Step 1–2 are low-risk and could land a *measured* idle-state
|
||||
`GetStoreForwardStatusAsync` over gRPC quickly. Steps 3–4 (full
|
||||
success criteria) still need the sacrificial-VM force-SF capture and
|
||||
are gated on §9.3 not landing on the D2 wall.
|
||||
|
||||
### 9.6 Out of scope (unchanged from §8, restated for gRPC)
|
||||
|
||||
`SetSFParameter`, `ForwardSnapshot*` (SF replay/transfer), the on-disk
|
||||
cache file format, and redundant-partner SF aggregation all remain out of
|
||||
scope. R4.3 is read-only status, gRPC-first.
|
||||
|
||||
### 9.7 Idle-baseline run — RESULTS (2026-06-21)
|
||||
|
||||
Built `HistorianGrpcStoreForwardStatusProbe` + the `grpc-sf-status-probe`
|
||||
CLI command and ran it against the **live 2023 R2 server** with the
|
||||
historian in its **idle / not-actively-storing** state (storage interface
|
||||
v4, authenticated session opened OK). Tested both read-only (`0x402`) and
|
||||
write-enabled (`0x401`) sessions. Findings, with the §9.3 handle question
|
||||
**resolved**:
|
||||
|
||||
1. **Direct `StorageService` SF pull RPCs are D2-gated — confirmed the
|
||||
§9.3 worst-case branch.**
|
||||
- `GetRemainingSnapshotsSize(session.ClientHandle)` →
|
||||
`bSuccess=false`, error buffer `04 84 00 00 00` (= status `0x84` /
|
||||
**132 `OperationNotEnabled`**). **Identical under `0x401` and
|
||||
`0x402`** — so it is NOT the read/write connection-mode gate; the
|
||||
History-session `ClientHandle` is simply not a valid handle for this
|
||||
op's handle-space.
|
||||
- `GetSFParameter(session.ClientHandle, <name>)` → server-side
|
||||
`RpcException(Unknown, "Exception was thrown by handler")` for **all
|
||||
16** candidate names, both session modes.
|
||||
- These two ops need the **`OpenStorageConnection` console handle**,
|
||||
and `OpenStorageConnection` itself fails with the storage-engine
|
||||
console error (`84 55 00 00 00 01 02 00 09 15 00`
|
||||
+ ASCII `"OpenStorageConnection"`) — the **D2 storage-engine-pipe
|
||||
wall**, the same root cause that blocks R4.2 revision writes. We
|
||||
cannot obtain the console handle, so these two SF RPCs are
|
||||
unreachable from a pure managed client. See
|
||||
[[project_roadmap_exhausted_2020wcf]].
|
||||
|
||||
2. **One reachable session-handle lever found:**
|
||||
`StatusService.GetHistorianConsoleStatus(strHandle)` **SUCCEEDS** with
|
||||
the session string handle (uppercase Open2 GUID) — no console handle
|
||||
needed — and returns `uiConsoleStatus = 3` at idle. This is the only
|
||||
SF-adjacent signal reachable from the managed client. **Its enum
|
||||
semantics are unknown** (3 = presumably "running/normal"); whether it
|
||||
shifts when SF is actively storing is the open question.
|
||||
|
||||
3. `StatusService.GetHistorianInfo(strHandle, btRequest)` → `bSuccess=
|
||||
false` for every `btRequest` candidate (empty / `u32(0)` / ascii+utf16
|
||||
`"StoreForward"`); its request framing is not yet known. Lower-yield
|
||||
than `GetHistorianConsoleStatus`; revisit only if needed.
|
||||
|
||||
**Net idle-baseline conclusion.** R4.3's clean direct route
|
||||
(`GetSFParameter` / `GetRemainingSnapshotsSize`) is **blocked behind the
|
||||
D2 storage-engine console pipe**, exactly like R4.2 — a pure managed
|
||||
client cannot open the console session those ops require. The *only*
|
||||
reachable SF-adjacent signal is `GetHistorianConsoleStatus` → a status
|
||||
uint. Two paths forward:
|
||||
|
||||
- **(a) Ship a measured idle-state only. — SHIPPED + LIVE-VERIFIED 2026-06-21.**
|
||||
`HistorianGrpcStatusClient.GetStoreForwardStatusAsync` opens a session,
|
||||
calls `GetHistorianConsoleStatus`, and returns
|
||||
`HistorianStoreForwardStatus` all-false but *measured*: it actually
|
||||
contacts the server and reports `ErrorOccurred=true` (with the underlying
|
||||
error) when the server is unreachable / the console-status call fails —
|
||||
strictly better than the blind hardcoded synthesis, which never contacts
|
||||
the server. Routed via `Historian2020ProtocolDialect.GetStoreForwardStatusAsync`
|
||||
when `Transport == RemoteGrpc` (non-gRPC keeps the synthesized fallback).
|
||||
Gated live test `HistorianGrpcIntegrationTests.GetStoreForwardStatusAsync_OverGrpc_ReturnsMeasuredIdleState`
|
||||
passes against the real 2023 R2 server. `Storing`/`Pending`/`DataStored`
|
||||
magnitude is intentionally NOT surfaced — it lives behind the D2 wall (see
|
||||
path (b)).
|
||||
- **(b) Full success criteria (§6) stay blocked** on the D2 console-pipe
|
||||
wall. Decoding the active-SF `uiConsoleStatus` value and any
|
||||
`GetSystemParameter` SF keys still needs the invasive force-SF capture
|
||||
on a sacrificial Historian — and even then `Storing`/`DataStored`
|
||||
magnitude is only available via the D2-gated `GetRemainingSnapshotsSize`.
|
||||
|
||||
Probe code: `src/AVEVA.Historian.Client/Grpc/HistorianGrpcStoreForwardStatusProbe.cs`,
|
||||
CLI `grpc-sf-status-probe <host> [port] [--tls] [--dnsid <n>] [--write-session]`.
|
||||
Writes nothing; releases any console session immediately.
|
||||
|
||||
Reference in New Issue
Block a user