c2d8fb9bc8
Add HistorianGrpcStoreForwardStatusProbe and the `grpc-sf-status-probe` CLI command. The idle-baseline run against the live 2023 R2 server resolves the plan's §9.3 handle question: the direct StorageService SF pull RPCs (GetSFParameter / GetRemainingSnapshotsSize) require the OpenStorageConnection console handle and are D2-gated (err 132, identical under read-only and write-enabled sessions), while StatusService.GetHistorianConsoleStatus IS reachable on the session string handle (=3 at idle). Records the gRPC re-scope and the idle-baseline findings in docs/plans/store-forward-cache-reverse-engineering.md §9. The probe writes nothing and releases any console session immediately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
713 lines
35 KiB
Markdown
713 lines
35 KiB
Markdown
# Store/Forward Cache Reverse-Engineering Plan
|
||
|
||
Last updated: 2026-06-21
|
||
|
||
> **2026-06-21 R4.3 re-scope — read this first.** The original plan below
|
||
> (2026-05-04) was written against the 2020 Net.TCP/WCF transport, before the
|
||
> 2023 R2 gRPC transport existed. Its single biggest open risk — *"is SF state
|
||
> readable via a one-shot pull, or only via a duplex push contract we'd have to
|
||
> add?"* (Q1/Q2 + §3 Step 3 + Risk 4) — is now **answered: pull, no duplex**.
|
||
> The recovered gRPC `StorageService` contract exposes SF state as plain
|
||
> request/response RPCs. The current R4.3 scope and recommended path are in
|
||
> §9 ("2026-06-21 gRPC re-scope"); the 2020-WCF body below is retained as
|
||
> background, not the recommended route.
|
||
|
||
Original last-updated: 2026-05-04
|
||
|
||
This document plans the reverse-engineering effort needed to replace the
|
||
synthesized `GetStoreForwardStatusAsync` in
|
||
`src/AVEVA.Historian.Client/Wcf/HistorianWcfStatusClient.cs` (lines 101-117)
|
||
with a real, evidence-backed implementation. It is a *plan*, not the work
|
||
itself. No code changes; no captures collected.
|
||
|
||
Read this together with:
|
||
|
||
- `docs/reverse-engineering/handoff.md` — read/event protocol decoding state
|
||
- `src/AVEVA.Historian.Client/Wcf/Contracts/IStorageServiceContract.cs` — the
|
||
WCF contract that already declares the SF parameter ops
|
||
- `src/AVEVA.Historian.Client/Models/HistorianStoreForwardStatus.cs` — the
|
||
output model the implementation must populate
|
||
|
||
## 1. Goal
|
||
|
||
"SF support works" means, end-to-end:
|
||
|
||
1. **Primary deliverable.** `client.GetStoreForwardStatusAsync()` against a
|
||
live local Historian returns a `HistorianStoreForwardStatus` whose
|
||
`Pending`, `Storing`, `DataStored`, `ErrorOccurred`, `Error`, `ServerName`,
|
||
and `ConnectionKind` fields reflect actual server-reported state, not the
|
||
synthesized defaults at
|
||
`HistorianWcfStatusClient.cs:107-117`.
|
||
2. **Secondary deliverable.** The SDK can also answer the higher-level
|
||
"is SF currently buffering?" question accurately when the runtime DB is
|
||
*down*, not just when it is up. That is the case the real native client
|
||
handles correctly and where the synthesized default (`Storing = false`,
|
||
`ErrorOccurred = false`) is silently wrong today.
|
||
3. **Non-goals.** Writing into SF, replaying SF buffers, configuring SF
|
||
parameters, redundant-partner SF aggregation
|
||
(`HistorianStoreForwardStatus.AddPartnerStoreForwardStatus`,
|
||
token `0x060060B8`). Read-only matches the project mission in
|
||
`CLAUDE.md`.
|
||
|
||
The success bar is parity with the native wrapper's
|
||
`ArchestrA.HistorianAccess.GetStoreForwardStatus`
|
||
(MD token `0x06006186` in `current/aahClientManaged.dll`),
|
||
not a superset.
|
||
|
||
## 2. Architecture Investigation (open questions, in priority order)
|
||
|
||
Answer these before writing any production code. Each has a discovery action
|
||
in §3.
|
||
|
||
### Q1. Is SF status read from a local in-process struct, a separate WCF endpoint, or a Named Pipe IPC?
|
||
|
||
Current evidence: **all three are plausible, but the wrapper actually uses
|
||
"in-process struct kept current by server-pushed WCF events"**. Specifically:
|
||
|
||
- `ArchestrA.HistorianAccess.GetStoreForwardStatus`
|
||
(token `0x06006187`, the private 2-arg overload) does *not* call WCF.
|
||
It calls `mdas_GetStorageStatus` (a `calli` against the
|
||
`INSQL_MDAS_ERROR (IntPtr handle, uint, HISTORIAN_STORAGE_STATUS*)` C
|
||
signature in `current/aahClient.dll` exports) and then maps the result
|
||
through `HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus`
|
||
(token `0x060060E4`).
|
||
- Mutators like `CConfigStatusClient.SetMdasStoreForwardEvent`
|
||
(token `0x060029DC`) and `aahClientCommon.CStatus.SetStoreForwardEvent`
|
||
(token `0x06002A04`) are wired to the WCF callback
|
||
`IStatusServiceContract2.SetStoreForwardEvent`
|
||
(`StatusServiceContract.IStatusServiceContract2.SetStoreForwardEvent`,
|
||
token `0x06005F57`). The server *pushes* SF state changes; the client
|
||
caches them.
|
||
- Confirm: read the IL of token `0x06006187` and verify the only system call
|
||
is `mdas_GetStorageStatus`. The first 200 instructions confirm this:
|
||
`GetClient(ConnectionIndex)` → `calli` against the
|
||
`INSQL_MDAS_ERROR(IntPtr,uint,HISTORIAN_STORAGE_STATUS*)` signature →
|
||
`ConvertUnmanagedSFStorageStatusToManagedStorageStatus`.
|
||
|
||
Implication: **the SDK cannot ship a synchronous probe that calls one WCF
|
||
operation and gets the answer**. It must subscribe to the same status-event
|
||
stream the native wrapper subscribes to, or call a status query that returns
|
||
the cached snapshot from the server.
|
||
|
||
### Q2. Is there a single-shot WCF query that returns the same snapshot?
|
||
|
||
Likely yes. Hypothesis: `IStatusServiceContract2.GetHistorianInfo`
|
||
(`GETHI`, see `IStatusServiceContract2.cs:24-30`) returns a multi-key status
|
||
blob whose schema includes SF state. Alternative: a status-only key passed to
|
||
`GetSystemParameter` (already plumbed via `HistorianWcfStatusClient.GetSystemParameterAsync`).
|
||
Both are testable without writing protocol code by sending probe payloads
|
||
and observing the response shape.
|
||
|
||
### Q3. Does SF have its own sidecar process / pipe / WCF endpoint we are missing?
|
||
|
||
Strong evidence the answer is yes when SF is *enabled*:
|
||
|
||
- `aahClientCommon.CSFConnection.GetSFPipeName` (token `0x06004B72`),
|
||
`GetSFPath` (`0x06004B71`), `IsConnected` (`0x06004B73`), `IsEnabled`
|
||
(`0x06004B6F`) — there is a separately-named SF Named Pipe distinct
|
||
from the main MDAS pipe.
|
||
- `aahClientCommon.CSFConnection.StartStoreforward` (token `0x06004BC6`).
|
||
- `IStorageServiceContract` already declares `GetStoreForwardParameter`
|
||
/ `SetStoreForwardParameter` (`GetSFP`/`SetSFP`,
|
||
see `IStorageServiceContract.cs:81-85`) and `Storage` is a separate
|
||
WCF service slot in `HistorianWcfServiceNames.cs:15`.
|
||
- `CWcfConfig.ConfigurePipeProxy<IStorageServiceContract>` (token
|
||
`0x06004B1C`) and `CWcfConfig.ConfigureTcpProxy<IStorageServiceContract>`
|
||
(token `0x06004B1B`) confirm the storage proxy supports both transports —
|
||
same dual-transport pattern the History/Retrieval proxies use.
|
||
- `CStorageEngineConsoleClient.GetPipeNameStr` (token `0x06000E2D`) /
|
||
`GetFullPipeNameStr` (token `0x06000E2E`) wraps the storage-engine
|
||
console pipe via `STransactPipeClient2` (a *non-WCF* binary pipe
|
||
protocol).
|
||
|
||
Open: **is the SF sidecar even running on the dev host this SDK is being
|
||
tested against?** `handoff.md` does not record an SF process being
|
||
observed. `aveva-install-x64/` and `aveva-install-x86/` ship only DLLs
|
||
(no `aahStoreForwardClient.exe` / `aahSFClient.exe` / similar). The SF
|
||
sidecar is part of the Historian *server* install, not the client
|
||
redistributable. So:
|
||
|
||
- On the developer machine, SF is reachable only because the local
|
||
Historian server is installed.
|
||
- A pure-client install (the deployment target this SDK ships into) may
|
||
*never* have SF.
|
||
|
||
This shapes the success criteria: when SF is not configured, a correct
|
||
implementation returns `Pending = false`, `ErrorOccurred = false`,
|
||
`DataStored = false`, `Storing = false` — i.e. the same shape the
|
||
synthesized defaults produce today. The interesting case is *when SF is
|
||
configured and active*.
|
||
|
||
### Q4. Is SF state authoritative on the Historian server or on a per-client basis?
|
||
|
||
Native wrapper reads it from `HistorianClient*` (the per-connection C++
|
||
object). This means it is *connection-scoped* server-pushed state. We
|
||
do not need to enumerate cluster-wide SF state — the server reports
|
||
"my SF buffer for this client's writes" only. This matches our read-only
|
||
mission: we are not a writer, so the only SF state of interest is the
|
||
server-side cache for *other* writers, which the server can report to
|
||
us as a passive observer.
|
||
|
||
### Q5. Does any SF probe require Admin?
|
||
|
||
`CSFConnection.GetSFPipeName` returns a kernel object name. Reading
|
||
from it requires the pipe ACL to permit the caller. If the SF pipe is
|
||
ACL'd to `LocalSystem` only, the SDK cannot read it without
|
||
impersonation — and the SDK runs as the calling process. This is a
|
||
hard limit, not a bug.
|
||
|
||
## 3. Discovery Workstreams
|
||
|
||
Run these in parallel. None require a live server beyond what the
|
||
existing test rig already has.
|
||
|
||
### Workstream A — Static IL inspection (parallel-safe, read-only)
|
||
|
||
Owner action items, in order:
|
||
|
||
1. Dump full IL of token `0x06006187`
|
||
(`HistorianAccess.GetStoreForwardStatus(ConnectionIndex,out)`):
|
||
```powershell
|
||
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- `
|
||
dnlib-method current\aahClientManaged.dll HistorianAccess.GetStoreForwardStatus --instructions
|
||
```
|
||
Save under `docs/reverse-engineering/historianaccess-getstoreforwardstatus-il-latest.txt`.
|
||
Confirm the `calli` target signature
|
||
`INSQL_MDAS_ERROR(IntPtr,uint,HISTORIAN_STORAGE_STATUS*)` and that
|
||
the only WCF entry-points it touches are zero.
|
||
2. Dump IL of `HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus`
|
||
(token `0x060060E4`). This is the unmanaged→managed mapping; it
|
||
tells us which fields of `HISTORIAN_STORAGE_STATUS` populate which
|
||
fields of `HistorianStoreForwardStatus`. We will need the same
|
||
mapping in reverse on the wire response.
|
||
3. Inventory every method that *writes* into the local SF status
|
||
struct:
|
||
```
|
||
methods current\aahClientManaged.dll SetStoreForward
|
||
methods current\aahClientManaged.dll SetMdasStoreForward
|
||
```
|
||
The known set as of writing:
|
||
`CConfigStatusClient.SetMdasStoreForwardEvent` (`0x060029DC`),
|
||
`aahClientCommon.CStatus.SetStoreForwardEvent` (`0x06002A04`),
|
||
`CStatusConnectionDirect.SetStoreForwardEvent` (`0x06004DF8`),
|
||
`CStatusConnectionWCF.SetStoreForwardEvent` (`0x06004E4E`),
|
||
`CClientCommon.SetStoreForwardEventOnServer` (`0x06002EC0`).
|
||
The `WCF` variant is the one whose IL maps onto
|
||
`IStatusServiceContract2.SetStoreForwardEvent`
|
||
(token `0x06005F57`) — read its IL and document the request/response
|
||
shape.
|
||
4. Dump IL of `IStatusServiceContract2.SetStoreForwardEvent`
|
||
(`0x06005F57`) parameter types. The `[OperationContract]`
|
||
declaration in the wrapper assembly already encodes the wire shape;
|
||
this gives us the bytes the server pushes us.
|
||
|
||
### Workstream B — Install inventory (parallel-safe)
|
||
|
||
1. Inventory `aveva-install-x64\` and `aveva-install-x86\` for any
|
||
binary whose name contains `Store`, `Forward`, `SF`, `Cache`,
|
||
`Spool`. As of this checkout: **none**, only DLLs. Confirm.
|
||
2. Inventory the deployed Historian server (out-of-band; not in this
|
||
repo) for `aahStoreForwardClient.exe`,
|
||
`aahStoreForwardServer.exe`, `aahSFCache.exe`, or any service
|
||
registered with `Description` matching `*Forward*`. Capture the
|
||
service name, account identity, and pipe ACLs (`accesschk -wuvc`).
|
||
3. Walk the registry: `HKLM\SOFTWARE\ArchestrA\Historian` and any
|
||
sub-key matching `*StoreForward*`, recording paths and pipe names.
|
||
Sanitize before committing.
|
||
|
||
### Workstream C — WCF probe (parallel-safe)
|
||
|
||
Use the existing `wcf-probe` and `wcf-status` subcommands of
|
||
`tools\AVEVA.Historian.ReverseEngineering`:
|
||
|
||
1. `wcf-probe $env:HISTORIAN_HOST 32568` — confirm `Storage/GetV` is
|
||
reachable. (It is the third service slot in
|
||
`HistorianWcfServiceNames`.) Document the returned interface
|
||
version.
|
||
2. `wcf-status $env:HISTORIAN_HOST 32568 <param-name>` — sweep
|
||
plausible SF parameter names (`SF.Status`, `StoreForward.State`,
|
||
`SFCacheBytes`, etc.) through `GetSystemParameter` and record what
|
||
the server accepts. Cheap, read-only, no session needed beyond the
|
||
already-decoded auth chain.
|
||
3. Probe `GetHistorianInfo` (`GETHI`,
|
||
`IStatusServiceContract2.cs:24`) with the byte request shape used
|
||
by the native wrapper. The request bytes are visible if we run
|
||
`instrument-wcf-readquery`-style instrumentation against
|
||
`CConfigStatusClient.SetMdasStoreForwardEvent`'s upstream caller —
|
||
see Workstream D.
|
||
|
||
### Workstream D — Native capture (sequential after A and C)
|
||
|
||
Two captures are needed:
|
||
|
||
1. **Native call to `mdas_GetStorageStatus`.** Run
|
||
`tools\AVEVA.Historian.NativeTraceHarness` with a new scenario
|
||
`--scenario sfstatus` (to be added) that invokes
|
||
`HistorianAccess.GetStoreForwardStatus()` and dumps the
|
||
`HISTORIAN_STORAGE_STATUS` C struct memory before the managed
|
||
conversion runs. This pins the binary layout of the struct
|
||
(offsets, field widths, endianness) without us guessing.
|
||
2. **WCF push of SF events.** Configure the local Historian to enter
|
||
SF mode (stop the runtime DB writer; let the writer's queue
|
||
trigger SF) and capture the WCF traffic with the existing
|
||
`instrument-wcf-readquery` sibling — i.e. add an
|
||
`instrument-wcf-setstoreforwardevent` subcommand that
|
||
IL-rewrites `aahClientManaged.dll` to log the bytes the server
|
||
sends to `IStatusServiceContract2.SetStoreForwardEvent`. Save
|
||
the rewrite under `docs/reverse-engineering/dnlib-write-copy/`,
|
||
never `current/`.
|
||
|
||
Workstream D is the only step that needs an actively-storing SF
|
||
sidecar. Plan: stop the Historian Runtime DB SQL service, write a
|
||
single test point via the wrapper's writer harness, and capture the
|
||
SF event push, then restart Runtime DB and capture the
|
||
"end-of-SF / data drained" push.
|
||
|
||
### Workstream E — On-disk cache (only if Workstream D fails)
|
||
|
||
If the WCF push protocol turns out to be impractical to reproduce
|
||
(e.g. requires duplex contract, callback channel, or a server-side
|
||
session-bind we cannot match from our managed client), fall back to
|
||
inspecting the on-disk SF cache directly. Steps:
|
||
|
||
1. Resolve `CSFConnection.GetSFPath` IL to find the cache directory
|
||
convention (likely `%ProgramData%\ArchestrA\Historian\Cache\` or
|
||
similar — to be confirmed, **never assume the path**).
|
||
2. Inventory file types: `.sfdata`, `.sfindex`, `.cache` — whatever
|
||
the directory contains.
|
||
3. Decode the file header. The presence/size of `.sfdata` files is
|
||
sufficient to populate `DataStored` and `Pending`; we do not
|
||
need to decode the value payload.
|
||
|
||
This fallback is only for `DataStored` / `Pending`. `Storing` and
|
||
`Error` fundamentally require a live server-state read.
|
||
|
||
## 4. Concrete Reverse-Engineering Steps (execution order)
|
||
|
||
Mirrors the read/event decoding workflow that succeeded for raw
|
||
queries.
|
||
|
||
### Step 1 — Find native methods that touch SF
|
||
|
||
Already done; baseline evidence is recorded in §2 Q1/Q3 above. Key
|
||
tokens to reference:
|
||
|
||
- `0x06006186`, `0x06006187` — public/private
|
||
`HistorianAccess.GetStoreForwardStatus`
|
||
- `0x060060E4` —
|
||
`HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus`
|
||
- `0x060029DC` — `CConfigStatusClient.SetMdasStoreForwardEvent`
|
||
- `0x06002A04` — `aahClientCommon.CStatus.SetStoreForwardEvent`
|
||
- `0x06002DFF` — `aahClientCommon.CClientCommon.IsInStoreForward`
|
||
- `0x06002E18` — `aahClientCommon.CClientCommon.SetStoreForwardParams`
|
||
- `0x06002EC0` — `CClientCommon.SetStoreForwardEventOnServer`
|
||
- `0x06004BC6` — `aahClientCommon.CSFConnection.StartStoreforward`
|
||
- `0x06004B6F`..`0x06004B73` — CSFConnection getters (path, pipe,
|
||
enabled, connected)
|
||
- `0x06004DF8`, `0x06004E4E` — direct vs WCF status connections
|
||
- `0x06005F57` — `IStatusServiceContract2.SetStoreForwardEvent` MD ref
|
||
- `0x06006193` — `HistorianAccess.IsBothConnectionRequested` (used by
|
||
the public arity-0 GetStoreForwardStatus to decide whether to fan
|
||
out to a redundant partner)
|
||
|
||
### Step 2 — Decode `HISTORIAN_STORAGE_STATUS` layout
|
||
|
||
Run Workstream A.2 (decode token `0x060060E4`) and Workstream D.1
|
||
(native struct memory dump). Together they pin the field layout.
|
||
|
||
The managed struct fields we already know we need to populate
|
||
(from `HistorianStoreForwardStatus.cs`):
|
||
`ServerName`, `Pending`, `ErrorOccurred`, `Error`, `DataStored`,
|
||
`Storing`, `ConnectionKind`. The native struct will have ≥7
|
||
fields plus padding. Express the mapping as a comment table in
|
||
the implementation.
|
||
|
||
### Step 3 — Decide the wire model
|
||
|
||
Two possible implementations:
|
||
|
||
1. **Push-mode (native parity).** SDK opens an authenticated WCF
|
||
session that the server treats as a status subscriber, listens
|
||
for `IStatusServiceContract2.SetStoreForwardEvent` callbacks,
|
||
maintains a local cache, and `GetStoreForwardStatusAsync`
|
||
returns from the cache. This requires WCF duplex
|
||
(`CallbackContract`) which is not currently exercised
|
||
anywhere in `src/AVEVA.Historian.Client/Wcf/`.
|
||
2. **Pull-mode (probe).** SDK calls `GetHistorianInfo` (`GETHI`)
|
||
or a discovered `Storage`-service equivalent and maps the
|
||
one-shot response. No subscription state required.
|
||
|
||
Pull-mode is strongly preferred: it matches the SDK's existing
|
||
WCF style, avoids duplex contracts, and the existing code path
|
||
in `HistorianWcfStatusClient.GetSystemParameter` is the right
|
||
shape. Only fall back to push-mode if Workstream C.3 proves the
|
||
server has no pull endpoint that returns SF state.
|
||
|
||
### Step 4 — Implement the managed contract method
|
||
|
||
Once Step 3 picks pull-mode, implement against the WCF contract
|
||
(likely a new `[OperationContract]` on `IStatusServiceContract2`
|
||
or a method on `IStorageServiceContract`). Follow the existing
|
||
parameter-naming discipline from the resolved
|
||
`ValidateClientCredential` blocker:
|
||
**use `[MessageParameter(Name = "...")]` to match exact server
|
||
element names — do not let WCF derive them from C# parameter
|
||
names.** See `handoff.md` "Active Blocker" entry for the
|
||
2026-05-04 fix.
|
||
|
||
### Step 5 — Add golden-byte fixtures
|
||
|
||
Add a request and response fixture under
|
||
`fixtures/protocol/store-forward-status/`:
|
||
|
||
- `request-get-storage-status.bin` — bytes the SDK sends.
|
||
- `response-get-storage-status-running-normal.bin` — server
|
||
not in SF.
|
||
- `response-get-storage-status-active-sf.bin` — server actively
|
||
storing.
|
||
- `response-get-storage-status-error.bin` — server's SF errored.
|
||
|
||
Capture sources: the same instrumented native wrapper runs that
|
||
populate Workstream D. Sanitize hostnames, GUIDs, and timestamps
|
||
before committing.
|
||
|
||
### Step 6 — Replace the synthesized stub
|
||
|
||
Replace `SynthesizeStoreForwardStatus` (lines 107-117 of
|
||
`HistorianWcfStatusClient.cs`) with a real implementation. Keep
|
||
the synthesized fallback for the case where the storage service
|
||
returns a "no SF configured" sentinel — that is *not* an error
|
||
condition, it is the normal state for client-only deployments.
|
||
|
||
Add a unit test class `WcfStoreForwardStatusProtocolTests` next
|
||
to the existing `WcfDataQueryProtocolTests` etc., with golden-byte
|
||
parse tests using the fixtures from Step 5.
|
||
|
||
Update the operation status table in `README.md:20` from
|
||
"synthesized defaults (no SF sidecar to probe)" to
|
||
"live-verified" once the integration test passes.
|
||
|
||
## 5. Risks and Gotchas
|
||
|
||
1. **SF may not be present on the test host.** The dev Historian
|
||
probably has SF disabled by default; turning it on means
|
||
stopping Runtime DB SQL services, which is invasive. Plan to do
|
||
capture work on a dedicated sacrificial Historian VM, not the
|
||
shared dev box.
|
||
2. **SF sidecar may require Admin or LocalSystem to query.** Any
|
||
pipe-direct fallback (Workstream E) will fail under standard
|
||
user accounts. Document the privilege requirement explicitly
|
||
in the SDK XML doc comments on `GetStoreForwardStatusAsync`.
|
||
3. **State is volatile.** Probes that take >100 ms can race
|
||
against the server's own SF state machine. Capture *both*
|
||
request and response in the same instrumented run; do not
|
||
try to correlate two captures.
|
||
4. **Push-mode would force a duplex WCF contract.** None of the
|
||
existing decoded operations use duplex. Adding it widens the
|
||
managed WCF surface significantly and risks .NET-WCF
|
||
compatibility issues we have not yet hit. Pull-mode first.
|
||
5. **The wrapper's `IsBothConnectionRequested` (token `0x06006193`)
|
||
path indicates a "primary + partner" topology.** Out of scope
|
||
for this pass per §1, but if the server returns partner data
|
||
in the same response we must skip-decode (not throw on)
|
||
unknown trailing bytes.
|
||
6. **`Open2`-only sessions never receive SF events.** `handoff.md`
|
||
"Active Blocker" notes the wrapper's full chain
|
||
(`OpenConnection3` after the `ValCl` rounds) is the path that
|
||
produces a session the server treats as a real client. SF
|
||
probes must run from inside that chain — re-using
|
||
`HistorianWcfAuthChainHelper.OpenAuthenticatedConnection`,
|
||
the same call site already used by `GetSystemParameter` at
|
||
`HistorianWcfStatusClient.cs:42`.
|
||
7. **`HISTORIAN_STORAGE_STATUS` field order is not contractual.**
|
||
The struct is C++ inside the closed source. If AVEVA reorders
|
||
fields between Historian versions, our decoder breaks. Pin the
|
||
decoder to the Historian server version observed at session
|
||
open (already exposed via `IRetrievalServiceContractN`) and
|
||
reject mismatched versions explicitly with
|
||
`ProtocolEvidenceMissingException`. Do not silently best-effort
|
||
parse.
|
||
8. **Sanitization.** Pipe names, registry paths, and SF cache
|
||
directory paths can leak hostnames and account names. Run the
|
||
`rg` sanitizer (handoff.md "Next Pickup Steps") after every
|
||
doc edit.
|
||
|
||
## 6. Success Criteria
|
||
|
||
A real implementation is "done" when all of the following hold:
|
||
|
||
1. `client.GetStoreForwardStatusAsync()` returns
|
||
`Pending = true` and `Storing = true` while the local
|
||
Historian's SF cache is actively buffering writes (verifiable
|
||
by stopping the Runtime DB and writing a value).
|
||
2. Returns `Pending = false` and `Storing = false` within
|
||
≤ 5 seconds after the Runtime DB recovers and SF drains.
|
||
3. Returns `ErrorOccurred = true` and a non-null, actionable
|
||
`Error` message when the SF cache itself fails (disk full,
|
||
pipe closed, etc.).
|
||
4. Returns the synthesized "no SF" shape (all-false) without
|
||
throwing on a Historian where SF is not configured.
|
||
5. Two new golden-byte unit tests pass (active-SF and idle-SF
|
||
responses).
|
||
6. `ProtocolGuardrailTests` no longer needs to exempt
|
||
`GetStoreForwardStatusAsync` from any "must throw
|
||
`ProtocolEvidenceMissingException`" rule — the method is now
|
||
evidence-backed.
|
||
7. Live integration test
|
||
`HistorianClientIntegrationTests.GetStoreForwardStatusAsync_ReturnsServerState`
|
||
(to be added) passes when `HISTORIAN_HOST` is set, skips
|
||
cleanly otherwise.
|
||
8. `README.md:20` operation status table is updated from
|
||
"synthesized defaults" to "live-verified".
|
||
|
||
## 7. Open Questions for the Implementer
|
||
|
||
Resolve these before writing production code:
|
||
|
||
1. Does the server expose a *pull* endpoint that returns the full
|
||
`HISTORIAN_STORAGE_STATUS` snapshot, or only push events?
|
||
(Workstream C.3 answers this.)
|
||
2. What is the binary layout of `HISTORIAN_STORAGE_STATUS`?
|
||
(Workstream A.2 + D.1.)
|
||
3. What is the `[OperationContract]` shape on
|
||
`IStatusServiceContract2.SetStoreForwardEvent`? Specifically:
|
||
parameter count, byte-buffer parameters, and exact
|
||
`MessageParameter` names? (Workstream A.4.)
|
||
4. Is the `Storage` service slot at
|
||
`net.pipe://<host>/Storage` and `net.tcp://<host>:32568/Storage`
|
||
reachable on a non-Historian-server install? Or does it 404
|
||
when only the client redistributable is present? (Workstream
|
||
B + C.1.)
|
||
5. Does the SF status snapshot include partner / redundant SF
|
||
state inline, or is it returned from a separate call?
|
||
(Workstream A.1, look for branches under
|
||
`IsBothConnectionRequested`.)
|
||
6. Does the SF status read require `OpenConnection3` to have
|
||
succeeded, or is `Open2` enough? (Trial: try the discovered
|
||
pull endpoint after `Open2` only, before doing
|
||
`OpenConnection3`. If it works, the implementation is much
|
||
simpler.)
|
||
7. What happens when SF is *disabled* by configuration vs
|
||
*enabled but idle*? Both should map to `Pending=false,
|
||
Storing=false`, but the underlying server response may be a
|
||
sentinel error vs an all-zeros struct. The implementation must
|
||
distinguish "no SF" (return defaults silently) from "SF errored"
|
||
(return `ErrorOccurred = true`).
|
||
|
||
## 8. Out of Scope
|
||
|
||
Explicitly not part of this plan:
|
||
|
||
- SF write-back (the project mission is read-only;
|
||
`IStorageServiceContract.AddStreamValues` etc. stay
|
||
unimplemented).
|
||
- Setting SF parameters
|
||
(`IStorageServiceContract.SetStoreForwardParameter`).
|
||
- Redundant-partner SF aggregation
|
||
(`HistorianStoreForwardStatus.AddPartnerStoreForwardStatus`).
|
||
- Reverse-engineering the on-disk SF cache file format beyond
|
||
presence / file count (Workstream E is a fallback, not a
|
||
primary deliverable).
|
||
- Anything in the
|
||
`aahClientCommon.CSFConnection.StartStoreforward` /
|
||
`SetStorageStopped` / `SetTagSynchronized` write surface.
|
||
|
||
## 9. 2026-06-21 gRPC re-scope (current R4.3 plan)
|
||
|
||
This supersedes the recommended *route* in §2/§3/§4. The deliverable
|
||
(§1) and success criteria (§6) are unchanged. What changed is the
|
||
transport and the resolved architecture risk.
|
||
|
||
### 9.1 What the recovered gRPC contract already gives us
|
||
|
||
The 2023 R2 contract under `src/AVEVA.Historian.Client/Grpc/Protos/`
|
||
exposes SF state through **first-class pull RPCs** on `StorageService`
|
||
(`StorageService.proto`) — no duplex/callback contract, no native
|
||
`HISTORIAN_STORAGE_STATUS` C-struct decode:
|
||
|
||
- `GetSFParameter(uint32 Handle, string ParameterName)
|
||
→ (Status status, string ParamaterValue)` — the direct analogue of the
|
||
already-shipped `GetSystemParameter`/`GetRuntimeParameter` string-keyed
|
||
pulls. This is the primary SF-state lever: a name→value read.
|
||
- `GetRemainingSnapshotsSize(uint32 Handle)
|
||
→ (Status status, uint64 SnapshotSize)` — the pending-buffer magnitude
|
||
in one call. Non-zero ⇒ data is queued (`Pending`/`DataStored=true`);
|
||
zero ⇒ drained. The cleanest single signal for the idle-vs-active split.
|
||
- `GetInfo(string Request) → (Status status, bytes info)` — generic
|
||
server info blob; a fallback if a named SF key lives here instead of in
|
||
`GetSFParameter`.
|
||
- `OpenStorageConnectionResponse.ServerStatus` (field 5) and the
|
||
`GetSnapshots`/`StartQuerySnapshot` family — secondary signals.
|
||
|
||
`SetSFParameter` exists too but is **out of scope** (read-only mission, §8).
|
||
|
||
The `TransactionService.ForwardSnapshot{,Begin,End}` RPCs are the SF
|
||
cache *replay/transfer* path (write-side), **not** a status read — also
|
||
out of scope here; they belong to the deferred bit-faithful SF cache work,
|
||
not to `GetStoreForwardStatusAsync`.
|
||
|
||
### 9.2 Plumbing that already exists (reuse, don't rebuild)
|
||
|
||
- `HistorianGrpcHandshake.OpenSession` — authenticated gRPC session
|
||
(`ValidateClientCredential` NTLM loop + Open2) yielding `ClientHandle`
|
||
(uint) + storage-session GUID. Live-verified against the 2023 R2 box.
|
||
- `HistorianGrpcStorageConnectionProbe` — already constructs a
|
||
`StorageService.StorageServiceClient`, primes `GetInterfaceVersion`, and
|
||
calls `OpenStorageConnection`/`CloseStorageConnection`. The SF-status
|
||
probe is a near-clone that swaps the `OpenStorageConnection` body for
|
||
`GetSFParameter`/`GetRemainingSnapshotsSize` calls.
|
||
- `HistorianGrpcChannelFactory` / `HistorianGrpcConnection` — channel,
|
||
metadata, deadlines.
|
||
|
||
### 9.3 The one open risk that survives: which `Handle`?
|
||
|
||
`GetSFParameter`/`GetRemainingSnapshotsSize` both take `uint32 Handle`.
|
||
Unknown: do they accept the **session `ClientHandle`** (from
|
||
`OpenSession`, which is cheap and unblocked), or do they require the
|
||
**storage console `Handle`** returned by `OpenStorageConnection` — which
|
||
is the D2 wall (`OpenStorageConnection` routes to the
|
||
`\\.\pipe\aahStorageEngine\console` session and is the same storage-engine
|
||
pipe that blocks revision writes)? See
|
||
[[project_roadmap_exhausted_2020wcf]] and `HistorianGrpcStorageConnectionProbe`
|
||
header.
|
||
|
||
- **Best case:** these read-only status RPCs accept the session
|
||
`ClientHandle` (status reads shouldn't need a console writer session).
|
||
Then R4.3-over-gRPC is unblocked end-to-end and is a small, shippable
|
||
feature.
|
||
- **Worst case:** they require the `OpenStorageConnection` `Handle` ⇒
|
||
R4.3 inherits the D2 storage-engine-pipe wall and stays blocked on the
|
||
same root cause as R4.2. Either way the probe answers it in one run.
|
||
|
||
### 9.4 Discovery steps (execution order)
|
||
|
||
1. **Add `grpc-sf-status-probe` to `tools/AVEVA.Historian.ReverseEngineering`**
|
||
(mirror `HistorianGrpcStorageConnectionProbe`). Against the live 2023 R2
|
||
server it:
|
||
- opens an authenticated session, gets `ClientHandle`;
|
||
- calls `GetRemainingSnapshotsSize(ClientHandle)` and reports
|
||
`status.bSuccess` + `SnapshotSize` + any error buffer;
|
||
- sweeps `GetSFParameter(ClientHandle, name)` over a candidate
|
||
name list (`Status`, `Storing`, `Pending`, `DataStored`,
|
||
`SF.Status`, `StoreForwardStatus`, `Forward`, `CacheSize`,
|
||
`ErrorOccurred`, plus any names surfaced by Workstream A's IL of
|
||
`ConvertUnmanagedSFStorageStatusToManagedStorageStatus`);
|
||
- records which names the server accepts and the returned values.
|
||
- If every call fails with an auth/handle-shaped error, retry once
|
||
with the `OpenStorageConnection` `Handle` to disambiguate §9.3.
|
||
2. **Idle baseline first** — run against the server with SF *not* active.
|
||
Establishes the "no SF / drained" response shape (expected:
|
||
`SnapshotSize=0`, parameter reads succeed-with-defaults or
|
||
return a "not configured" sentinel). This alone may be enough to ship
|
||
an honest idle-state implementation that is strictly better than
|
||
today's hardcoded all-false synthesis (it would be *measured* false).
|
||
3. **Active-SF capture** — only if step 2 proves the read works and we
|
||
need the active-state fixtures. Force SF on the sacrificial Historian
|
||
VM (stop Runtime DB writer; let the queue spill to SF), re-run the
|
||
probe, capture the non-zero/`Storing=true` response. This is the one
|
||
invasive step and the gate on full success criteria §6.1–6.3.
|
||
4. **Map + implement** — add `GrpcGetStoreForwardStatus` to the gRPC
|
||
read orchestrator, map the probed fields onto
|
||
`HistorianStoreForwardStatus`, route `GetStoreForwardStatusAsync`
|
||
to it when `Transport == RemoteGrpc` (keep the synthesized fallback
|
||
for non-gRPC transports and for the "no SF configured" sentinel).
|
||
Add golden-byte fixtures (idle + active) and
|
||
`WcfStoreForwardStatusProtocolTests`-style parse tests. Gate the live
|
||
integration test on `HISTORIAN_GRPC_HOST`.
|
||
|
||
### 9.5 Effort / feasibility summary
|
||
|
||
- **Risk collapsed:** pull-vs-push (the old plan's worst risk) is settled
|
||
— it's a pull. No duplex WCF/gRPC callback contract.
|
||
- **No native struct decode:** `GetSFParameter` returns a *string*; we
|
||
skip the `HISTORIAN_STORAGE_STATUS` C-layout RE entirely (Workstream
|
||
A.2 / D.1 become "nice-to-have for field names", not blocking).
|
||
- **Reuses shipped plumbing:** session open + `StorageServiceClient` +
|
||
channel already exist and are live-verified.
|
||
- **Remaining unknowns are empirical, one probe-run each:** (a) the
|
||
accepted parameter-name vocabulary, (b) which `Handle` the status RPCs
|
||
want (§9.3 — the only thing that could re-block it), (c) the
|
||
active-SF response shape (needs the invasive force-SF step).
|
||
- **Net:** Step 1–2 are low-risk and could land a *measured* idle-state
|
||
`GetStoreForwardStatusAsync` over gRPC quickly. Steps 3–4 (full
|
||
success criteria) still need the sacrificial-VM force-SF capture and
|
||
are gated on §9.3 not landing on the D2 wall.
|
||
|
||
### 9.6 Out of scope (unchanged from §8, restated for gRPC)
|
||
|
||
`SetSFParameter`, `ForwardSnapshot*` (SF replay/transfer), the on-disk
|
||
cache file format, and redundant-partner SF aggregation all remain out of
|
||
scope. R4.3 is read-only status, gRPC-first.
|
||
|
||
### 9.7 Idle-baseline run — RESULTS (2026-06-21)
|
||
|
||
Built `HistorianGrpcStoreForwardStatusProbe` + the `grpc-sf-status-probe`
|
||
CLI command and ran it against the **live 2023 R2 server** with the
|
||
historian in its **idle / not-actively-storing** state (storage interface
|
||
v4, authenticated session opened OK). Tested both read-only (`0x402`) and
|
||
write-enabled (`0x401`) sessions. Findings, with the §9.3 handle question
|
||
**resolved**:
|
||
|
||
1. **Direct `StorageService` SF pull RPCs are D2-gated — confirmed the
|
||
§9.3 worst-case branch.**
|
||
- `GetRemainingSnapshotsSize(session.ClientHandle)` →
|
||
`bSuccess=false`, error buffer `04 84 00 00 00` (= status `0x84` /
|
||
**132 `OperationNotEnabled`**). **Identical under `0x401` and
|
||
`0x402`** — so it is NOT the read/write connection-mode gate; the
|
||
History-session `ClientHandle` is simply not a valid handle for this
|
||
op's handle-space.
|
||
- `GetSFParameter(session.ClientHandle, <name>)` → server-side
|
||
`RpcException(Unknown, "Exception was thrown by handler")` for **all
|
||
16** candidate names, both session modes.
|
||
- These two ops need the **`OpenStorageConnection` console handle**,
|
||
and `OpenStorageConnection` itself fails with the storage-engine
|
||
console error (`84 55 00 00 00 01 02 00 09 15 00`
|
||
+ ASCII `"OpenStorageConnection"`) — the **D2 storage-engine-pipe
|
||
wall**, the same root cause that blocks R4.2 revision writes. We
|
||
cannot obtain the console handle, so these two SF RPCs are
|
||
unreachable from a pure managed client. See
|
||
[[project_roadmap_exhausted_2020wcf]].
|
||
|
||
2. **One reachable session-handle lever found:**
|
||
`StatusService.GetHistorianConsoleStatus(strHandle)` **SUCCEEDS** with
|
||
the session string handle (uppercase Open2 GUID) — no console handle
|
||
needed — and returns `uiConsoleStatus = 3` at idle. This is the only
|
||
SF-adjacent signal reachable from the managed client. **Its enum
|
||
semantics are unknown** (3 = presumably "running/normal"); whether it
|
||
shifts when SF is actively storing is the open question.
|
||
|
||
3. `StatusService.GetHistorianInfo(strHandle, btRequest)` → `bSuccess=
|
||
false` for every `btRequest` candidate (empty / `u32(0)` / ascii+utf16
|
||
`"StoreForward"`); its request framing is not yet known. Lower-yield
|
||
than `GetHistorianConsoleStatus`; revisit only if needed.
|
||
|
||
**Net idle-baseline conclusion.** R4.3's clean direct route
|
||
(`GetSFParameter` / `GetRemainingSnapshotsSize`) is **blocked behind the
|
||
D2 storage-engine console pipe**, exactly like R4.2 — a pure managed
|
||
client cannot open the console session those ops require. The *only*
|
||
reachable SF-adjacent signal is `GetHistorianConsoleStatus` → a status
|
||
uint. Two paths forward:
|
||
|
||
- **(a) Ship a measured idle-state only. — SHIPPED + LIVE-VERIFIED 2026-06-21.**
|
||
`HistorianGrpcStatusClient.GetStoreForwardStatusAsync` opens a session,
|
||
calls `GetHistorianConsoleStatus`, and returns
|
||
`HistorianStoreForwardStatus` all-false but *measured*: it actually
|
||
contacts the server and reports `ErrorOccurred=true` (with the underlying
|
||
error) when the server is unreachable / the console-status call fails —
|
||
strictly better than the blind hardcoded synthesis, which never contacts
|
||
the server. Routed via `Historian2020ProtocolDialect.GetStoreForwardStatusAsync`
|
||
when `Transport == RemoteGrpc` (non-gRPC keeps the synthesized fallback).
|
||
Gated live test `HistorianGrpcIntegrationTests.GetStoreForwardStatusAsync_OverGrpc_ReturnsMeasuredIdleState`
|
||
passes against the real 2023 R2 server. `Storing`/`Pending`/`DataStored`
|
||
magnitude is intentionally NOT surfaced — it lives behind the D2 wall (see
|
||
path (b)).
|
||
- **(b) Full success criteria (§6) stay blocked** on the D2 console-pipe
|
||
wall. Decoding the active-SF `uiConsoleStatus` value and any
|
||
`GetSystemParameter` SF keys still needs the invasive force-SF capture
|
||
on a sacrificial Historian — and even then `Storing`/`DataStored`
|
||
magnitude is only available via the D2-gated `GetRemainingSnapshotsSize`.
|
||
|
||
Probe code: `src/AVEVA.Historian.Client/Grpc/HistorianGrpcStoreForwardStatusProbe.cs`,
|
||
CLI `grpc-sf-status-probe <host> [port] [--tls] [--dnsid <n>] [--write-session]`.
|
||
Writes nothing; releases any console session immediately.
|