Files
histsdk/docs/plans/store-forward-cache-reverse-engineering.md
dohertj2 6f01b83313 Plan two reverse-engineering campaigns: write commands + store/forward cache
docs/plans/write-commands-reverse-engineering.md (425 lines):
  Plan for adding WriteValueAsync (AddS2 stream values), EnsureTags2 for
  analog/discrete/string tags, and DelT for sandbox cleanup. Hard safety
  rules center on a dedicated sandbox tag gated by env var, time-bounded
  writes, SQL ground-truth verification per session, explicit rollback.
  Five-step RE workflow mirrors the read/event decode (static IL discovery
  -> instrument-wcf-writemessage capture -> instrument-wcf-readmessage
  capture -> byte/IL alignment -> managed serializer + golden-byte tests).
  Risks call out auth-chain unknowns, parameter-name-mismatch class,
  silent-success failure modes, History-vs-Storage service question.

docs/plans/store-forward-cache-reverse-engineering.md (501 lines):
  Plan for replacing the synthesized GetStoreForwardStatusAsync with a
  real implementation. Architecture investigation already partially
  answered via IL inspection during planning: ArchestrA.HistorianAccess.
  GetStoreForwardStatus (token 0x06006187) reads an in-process C struct
  via calli to mdas_GetStorageStatus, kept current by server-pushed WCF
  callbacks (IStatusServiceContract2.SetStoreForwardEvent). CSFConnection.
  GetSFPipeName indicates a separate Named Pipe sidecar exists when SF
  is configured. Five parallelizable discovery workstreams, six concrete
  RE steps with cited tokens, eight risks, eight success criteria.

Both plans deliberately produce no code changes and no captures. They
exist so the next implementer can start with full context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 07:16:32 -04:00

23 KiB

Store/Forward Cache Reverse-Engineering Plan

Last updated: 2026-05-04

This document plans the reverse-engineering effort needed to replace the synthesized GetStoreForwardStatusAsync in src/AVEVA.Historian.Client/Wcf/HistorianWcfStatusClient.cs (lines 101-117) with a real, evidence-backed implementation. It is a plan, not the work itself. No code changes; no captures collected.

Read this together with:

  • docs/reverse-engineering/handoff.md — read/event protocol decoding state
  • src/AVEVA.Historian.Client/Wcf/Contracts/IStorageServiceContract.cs — the WCF contract that already declares the SF parameter ops
  • src/AVEVA.Historian.Client/Models/HistorianStoreForwardStatus.cs — the output model the implementation must populate

1. Goal

"SF support works" means, end-to-end:

  1. Primary deliverable. client.GetStoreForwardStatusAsync() against a live local Historian returns a HistorianStoreForwardStatus whose Pending, Storing, DataStored, ErrorOccurred, Error, ServerName, and ConnectionKind fields reflect actual server-reported state, not the synthesized defaults at HistorianWcfStatusClient.cs:107-117.
  2. Secondary deliverable. The SDK can also answer the higher-level "is SF currently buffering?" question accurately when the runtime DB is down, not just when it is up. That is the case the real native client handles correctly and where the synthesized default (Storing = false, ErrorOccurred = false) is silently wrong today.
  3. Non-goals. Writing into SF, replaying SF buffers, configuring SF parameters, redundant-partner SF aggregation (HistorianStoreForwardStatus.AddPartnerStoreForwardStatus, token 0x060060B8). Read-only matches the project mission in CLAUDE.md.

The success bar is parity with the native wrapper's ArchestrA.HistorianAccess.GetStoreForwardStatus (MD token 0x06006186 in current/aahClientManaged.dll), not a superset.

2. Architecture Investigation (open questions, in priority order)

Answer these before writing any production code. Each has a discovery action in §3.

Q1. Is SF status read from a local in-process struct, a separate WCF endpoint, or a Named Pipe IPC?

Current evidence: all three are plausible, but the wrapper actually uses "in-process struct kept current by server-pushed WCF events". Specifically:

  • ArchestrA.HistorianAccess.GetStoreForwardStatus (token 0x06006187, the private 2-arg overload) does not call WCF. It calls mdas_GetStorageStatus (a calli against the INSQL_MDAS_ERROR (IntPtr handle, uint, HISTORIAN_STORAGE_STATUS*) C signature in current/aahClient.dll exports) and then maps the result through HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus (token 0x060060E4).
  • Mutators like CConfigStatusClient.SetMdasStoreForwardEvent (token 0x060029DC) and aahClientCommon.CStatus.SetStoreForwardEvent (token 0x06002A04) are wired to the WCF callback IStatusServiceContract2.SetStoreForwardEvent (StatusServiceContract.IStatusServiceContract2.SetStoreForwardEvent, token 0x06005F57). The server pushes SF state changes; the client caches them.
  • Confirm: read the IL of token 0x06006187 and verify the only system call is mdas_GetStorageStatus. The first 200 instructions confirm this: GetClient(ConnectionIndex)calli against the INSQL_MDAS_ERROR(IntPtr,uint,HISTORIAN_STORAGE_STATUS*) signature → ConvertUnmanagedSFStorageStatusToManagedStorageStatus.

Implication: the SDK cannot ship a synchronous probe that calls one WCF operation and gets the answer. It must subscribe to the same status-event stream the native wrapper subscribes to, or call a status query that returns the cached snapshot from the server.

Q2. Is there a single-shot WCF query that returns the same snapshot?

Likely yes. Hypothesis: IStatusServiceContract2.GetHistorianInfo (GETHI, see IStatusServiceContract2.cs:24-30) returns a multi-key status blob whose schema includes SF state. Alternative: a status-only key passed to GetSystemParameter (already plumbed via HistorianWcfStatusClient.GetSystemParameterAsync). Both are testable without writing protocol code by sending probe payloads and observing the response shape.

Q3. Does SF have its own sidecar process / pipe / WCF endpoint we are missing?

Strong evidence the answer is yes when SF is enabled:

  • aahClientCommon.CSFConnection.GetSFPipeName (token 0x06004B72), GetSFPath (0x06004B71), IsConnected (0x06004B73), IsEnabled (0x06004B6F) — there is a separately-named SF Named Pipe distinct from the main MDAS pipe.
  • aahClientCommon.CSFConnection.StartStoreforward (token 0x06004BC6).
  • IStorageServiceContract already declares GetStoreForwardParameter / SetStoreForwardParameter (GetSFP/SetSFP, see IStorageServiceContract.cs:81-85) and Storage is a separate WCF service slot in HistorianWcfServiceNames.cs:15.
  • CWcfConfig.ConfigurePipeProxy<IStorageServiceContract> (token 0x06004B1C) and CWcfConfig.ConfigureTcpProxy<IStorageServiceContract> (token 0x06004B1B) confirm the storage proxy supports both transports — same dual-transport pattern the History/Retrieval proxies use.
  • CStorageEngineConsoleClient.GetPipeNameStr (token 0x06000E2D) / GetFullPipeNameStr (token 0x06000E2E) wraps the storage-engine console pipe via STransactPipeClient2 (a non-WCF binary pipe protocol).

Open: is the SF sidecar even running on the dev host this SDK is being tested against? handoff.md does not record an SF process being observed. aveva-install-x64/ and aveva-install-x86/ ship only DLLs (no aahStoreForwardClient.exe / aahSFClient.exe / similar). The SF sidecar is part of the Historian server install, not the client redistributable. So:

  • On the developer machine, SF is reachable only because the local Historian server is installed.
  • A pure-client install (the deployment target this SDK ships into) may never have SF.

This shapes the success criteria: when SF is not configured, a correct implementation returns Pending = false, ErrorOccurred = false, DataStored = false, Storing = false — i.e. the same shape the synthesized defaults produce today. The interesting case is when SF is configured and active.

Q4. Is SF state authoritative on the Historian server or on a per-client basis?

Native wrapper reads it from HistorianClient* (the per-connection C++ object). This means it is connection-scoped server-pushed state. We do not need to enumerate cluster-wide SF state — the server reports "my SF buffer for this client's writes" only. This matches our read-only mission: we are not a writer, so the only SF state of interest is the server-side cache for other writers, which the server can report to us as a passive observer.

Q5. Does any SF probe require Admin?

CSFConnection.GetSFPipeName returns a kernel object name. Reading from it requires the pipe ACL to permit the caller. If the SF pipe is ACL'd to LocalSystem only, the SDK cannot read it without impersonation — and the SDK runs as the calling process. This is a hard limit, not a bug.

3. Discovery Workstreams

Run these in parallel. None require a live server beyond what the existing test rig already has.

Workstream A — Static IL inspection (parallel-safe, read-only)

Owner action items, in order:

  1. Dump full IL of token 0x06006187 (HistorianAccess.GetStoreForwardStatus(ConnectionIndex,out)):
    dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- `
      dnlib-method current\aahClientManaged.dll HistorianAccess.GetStoreForwardStatus --instructions
    
    Save under docs/reverse-engineering/historianaccess-getstoreforwardstatus-il-latest.txt. Confirm the calli target signature INSQL_MDAS_ERROR(IntPtr,uint,HISTORIAN_STORAGE_STATUS*) and that the only WCF entry-points it touches are zero.
  2. Dump IL of HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus (token 0x060060E4). This is the unmanaged→managed mapping; it tells us which fields of HISTORIAN_STORAGE_STATUS populate which fields of HistorianStoreForwardStatus. We will need the same mapping in reverse on the wire response.
  3. Inventory every method that writes into the local SF status struct:
    methods current\aahClientManaged.dll SetStoreForward
    methods current\aahClientManaged.dll SetMdasStoreForward
    
    The known set as of writing: CConfigStatusClient.SetMdasStoreForwardEvent (0x060029DC), aahClientCommon.CStatus.SetStoreForwardEvent (0x06002A04), CStatusConnectionDirect.SetStoreForwardEvent (0x06004DF8), CStatusConnectionWCF.SetStoreForwardEvent (0x06004E4E), CClientCommon.SetStoreForwardEventOnServer (0x06002EC0). The WCF variant is the one whose IL maps onto IStatusServiceContract2.SetStoreForwardEvent (token 0x06005F57) — read its IL and document the request/response shape.
  4. Dump IL of IStatusServiceContract2.SetStoreForwardEvent (0x06005F57) parameter types. The [OperationContract] declaration in the wrapper assembly already encodes the wire shape; this gives us the bytes the server pushes us.

Workstream B — Install inventory (parallel-safe)

  1. Inventory aveva-install-x64\ and aveva-install-x86\ for any binary whose name contains Store, Forward, SF, Cache, Spool. As of this checkout: none, only DLLs. Confirm.
  2. Inventory the deployed Historian server (out-of-band; not in this repo) for aahStoreForwardClient.exe, aahStoreForwardServer.exe, aahSFCache.exe, or any service registered with Description matching *Forward*. Capture the service name, account identity, and pipe ACLs (accesschk -wuvc).
  3. Walk the registry: HKLM\SOFTWARE\ArchestrA\Historian and any sub-key matching *StoreForward*, recording paths and pipe names. Sanitize before committing.

Workstream C — WCF probe (parallel-safe)

Use the existing wcf-probe and wcf-status subcommands of tools\AVEVA.Historian.ReverseEngineering:

  1. wcf-probe $env:HISTORIAN_HOST 32568 — confirm Storage/GetV is reachable. (It is the third service slot in HistorianWcfServiceNames.) Document the returned interface version.
  2. wcf-status $env:HISTORIAN_HOST 32568 <param-name> — sweep plausible SF parameter names (SF.Status, StoreForward.State, SFCacheBytes, etc.) through GetSystemParameter and record what the server accepts. Cheap, read-only, no session needed beyond the already-decoded auth chain.
  3. Probe GetHistorianInfo (GETHI, IStatusServiceContract2.cs:24) with the byte request shape used by the native wrapper. The request bytes are visible if we run instrument-wcf-readquery-style instrumentation against CConfigStatusClient.SetMdasStoreForwardEvent's upstream caller — see Workstream D.

Workstream D — Native capture (sequential after A and C)

Two captures are needed:

  1. Native call to mdas_GetStorageStatus. Run tools\AVEVA.Historian.NativeTraceHarness with a new scenario --scenario sfstatus (to be added) that invokes HistorianAccess.GetStoreForwardStatus() and dumps the HISTORIAN_STORAGE_STATUS C struct memory before the managed conversion runs. This pins the binary layout of the struct (offsets, field widths, endianness) without us guessing.
  2. WCF push of SF events. Configure the local Historian to enter SF mode (stop the runtime DB writer; let the writer's queue trigger SF) and capture the WCF traffic with the existing instrument-wcf-readquery sibling — i.e. add an instrument-wcf-setstoreforwardevent subcommand that IL-rewrites aahClientManaged.dll to log the bytes the server sends to IStatusServiceContract2.SetStoreForwardEvent. Save the rewrite under docs/reverse-engineering/dnlib-write-copy/, never current/.

Workstream D is the only step that needs an actively-storing SF sidecar. Plan: stop the Historian Runtime DB SQL service, write a single test point via the wrapper's writer harness, and capture the SF event push, then restart Runtime DB and capture the "end-of-SF / data drained" push.

Workstream E — On-disk cache (only if Workstream D fails)

If the WCF push protocol turns out to be impractical to reproduce (e.g. requires duplex contract, callback channel, or a server-side session-bind we cannot match from our managed client), fall back to inspecting the on-disk SF cache directly. Steps:

  1. Resolve CSFConnection.GetSFPath IL to find the cache directory convention (likely %ProgramData%\ArchestrA\Historian\Cache\ or similar — to be confirmed, never assume the path).
  2. Inventory file types: .sfdata, .sfindex, .cache — whatever the directory contains.
  3. Decode the file header. The presence/size of .sfdata files is sufficient to populate DataStored and Pending; we do not need to decode the value payload.

This fallback is only for DataStored / Pending. Storing and Error fundamentally require a live server-state read.

4. Concrete Reverse-Engineering Steps (execution order)

Mirrors the read/event decoding workflow that succeeded for raw queries.

Step 1 — Find native methods that touch SF

Already done; baseline evidence is recorded in §2 Q1/Q3 above. Key tokens to reference:

  • 0x06006186, 0x06006187 — public/private HistorianAccess.GetStoreForwardStatus
  • 0x060060E4HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus
  • 0x060029DCCConfigStatusClient.SetMdasStoreForwardEvent
  • 0x06002A04aahClientCommon.CStatus.SetStoreForwardEvent
  • 0x06002DFFaahClientCommon.CClientCommon.IsInStoreForward
  • 0x06002E18aahClientCommon.CClientCommon.SetStoreForwardParams
  • 0x06002EC0CClientCommon.SetStoreForwardEventOnServer
  • 0x06004BC6aahClientCommon.CSFConnection.StartStoreforward
  • 0x06004B6F..0x06004B73 — CSFConnection getters (path, pipe, enabled, connected)
  • 0x06004DF8, 0x06004E4E — direct vs WCF status connections
  • 0x06005F57IStatusServiceContract2.SetStoreForwardEvent MD ref
  • 0x06006193HistorianAccess.IsBothConnectionRequested (used by the public arity-0 GetStoreForwardStatus to decide whether to fan out to a redundant partner)

Step 2 — Decode HISTORIAN_STORAGE_STATUS layout

Run Workstream A.2 (decode token 0x060060E4) and Workstream D.1 (native struct memory dump). Together they pin the field layout.

The managed struct fields we already know we need to populate (from HistorianStoreForwardStatus.cs): ServerName, Pending, ErrorOccurred, Error, DataStored, Storing, ConnectionKind. The native struct will have ≥7 fields plus padding. Express the mapping as a comment table in the implementation.

Step 3 — Decide the wire model

Two possible implementations:

  1. Push-mode (native parity). SDK opens an authenticated WCF session that the server treats as a status subscriber, listens for IStatusServiceContract2.SetStoreForwardEvent callbacks, maintains a local cache, and GetStoreForwardStatusAsync returns from the cache. This requires WCF duplex (CallbackContract) which is not currently exercised anywhere in src/AVEVA.Historian.Client/Wcf/.
  2. Pull-mode (probe). SDK calls GetHistorianInfo (GETHI) or a discovered Storage-service equivalent and maps the one-shot response. No subscription state required.

Pull-mode is strongly preferred: it matches the SDK's existing WCF style, avoids duplex contracts, and the existing code path in HistorianWcfStatusClient.GetSystemParameter is the right shape. Only fall back to push-mode if Workstream C.3 proves the server has no pull endpoint that returns SF state.

Step 4 — Implement the managed contract method

Once Step 3 picks pull-mode, implement against the WCF contract (likely a new [OperationContract] on IStatusServiceContract2 or a method on IStorageServiceContract). Follow the existing parameter-naming discipline from the resolved ValidateClientCredential blocker: use [MessageParameter(Name = "...")] to match exact server element names — do not let WCF derive them from C# parameter names. See handoff.md "Active Blocker" entry for the 2026-05-04 fix.

Step 5 — Add golden-byte fixtures

Add a request and response fixture under fixtures/protocol/store-forward-status/:

  • request-get-storage-status.bin — bytes the SDK sends.
  • response-get-storage-status-running-normal.bin — server not in SF.
  • response-get-storage-status-active-sf.bin — server actively storing.
  • response-get-storage-status-error.bin — server's SF errored.

Capture sources: the same instrumented native wrapper runs that populate Workstream D. Sanitize hostnames, GUIDs, and timestamps before committing.

Step 6 — Replace the synthesized stub

Replace SynthesizeStoreForwardStatus (lines 107-117 of HistorianWcfStatusClient.cs) with a real implementation. Keep the synthesized fallback for the case where the storage service returns a "no SF configured" sentinel — that is not an error condition, it is the normal state for client-only deployments.

Add a unit test class WcfStoreForwardStatusProtocolTests next to the existing WcfDataQueryProtocolTests etc., with golden-byte parse tests using the fixtures from Step 5.

Update the operation status table in README.md:20 from "synthesized defaults (no SF sidecar to probe)" to "live-verified" once the integration test passes.

5. Risks and Gotchas

  1. SF may not be present on the test host. The dev Historian probably has SF disabled by default; turning it on means stopping Runtime DB SQL services, which is invasive. Plan to do capture work on a dedicated sacrificial Historian VM, not the shared dev box.
  2. SF sidecar may require Admin or LocalSystem to query. Any pipe-direct fallback (Workstream E) will fail under standard user accounts. Document the privilege requirement explicitly in the SDK XML doc comments on GetStoreForwardStatusAsync.
  3. State is volatile. Probes that take >100 ms can race against the server's own SF state machine. Capture both request and response in the same instrumented run; do not try to correlate two captures.
  4. Push-mode would force a duplex WCF contract. None of the existing decoded operations use duplex. Adding it widens the managed WCF surface significantly and risks .NET-WCF compatibility issues we have not yet hit. Pull-mode first.
  5. The wrapper's IsBothConnectionRequested (token 0x06006193) path indicates a "primary + partner" topology. Out of scope for this pass per §1, but if the server returns partner data in the same response we must skip-decode (not throw on) unknown trailing bytes.
  6. Open2-only sessions never receive SF events. handoff.md "Active Blocker" notes the wrapper's full chain (OpenConnection3 after the ValCl rounds) is the path that produces a session the server treats as a real client. SF probes must run from inside that chain — re-using HistorianWcfAuthChainHelper.OpenAuthenticatedConnection, the same call site already used by GetSystemParameter at HistorianWcfStatusClient.cs:42.
  7. HISTORIAN_STORAGE_STATUS field order is not contractual. The struct is C++ inside the closed source. If AVEVA reorders fields between Historian versions, our decoder breaks. Pin the decoder to the Historian server version observed at session open (already exposed via IRetrievalServiceContractN) and reject mismatched versions explicitly with ProtocolEvidenceMissingException. Do not silently best-effort parse.
  8. Sanitization. Pipe names, registry paths, and SF cache directory paths can leak hostnames and account names. Run the rg sanitizer (handoff.md "Next Pickup Steps") after every doc edit.

6. Success Criteria

A real implementation is "done" when all of the following hold:

  1. client.GetStoreForwardStatusAsync() returns Pending = true and Storing = true while the local Historian's SF cache is actively buffering writes (verifiable by stopping the Runtime DB and writing a value).
  2. Returns Pending = false and Storing = false within ≤ 5 seconds after the Runtime DB recovers and SF drains.
  3. Returns ErrorOccurred = true and a non-null, actionable Error message when the SF cache itself fails (disk full, pipe closed, etc.).
  4. Returns the synthesized "no SF" shape (all-false) without throwing on a Historian where SF is not configured.
  5. Two new golden-byte unit tests pass (active-SF and idle-SF responses).
  6. ProtocolGuardrailTests no longer needs to exempt GetStoreForwardStatusAsync from any "must throw ProtocolEvidenceMissingException" rule — the method is now evidence-backed.
  7. Live integration test HistorianClientIntegrationTests.GetStoreForwardStatusAsync_ReturnsServerState (to be added) passes when HISTORIAN_HOST is set, skips cleanly otherwise.
  8. README.md:20 operation status table is updated from "synthesized defaults" to "live-verified".

7. Open Questions for the Implementer

Resolve these before writing production code:

  1. Does the server expose a pull endpoint that returns the full HISTORIAN_STORAGE_STATUS snapshot, or only push events? (Workstream C.3 answers this.)
  2. What is the binary layout of HISTORIAN_STORAGE_STATUS? (Workstream A.2 + D.1.)
  3. What is the [OperationContract] shape on IStatusServiceContract2.SetStoreForwardEvent? Specifically: parameter count, byte-buffer parameters, and exact MessageParameter names? (Workstream A.4.)
  4. Is the Storage service slot at net.pipe://<host>/Storage and net.tcp://<host>:32568/Storage reachable on a non-Historian-server install? Or does it 404 when only the client redistributable is present? (Workstream B + C.1.)
  5. Does the SF status snapshot include partner / redundant SF state inline, or is it returned from a separate call? (Workstream A.1, look for branches under IsBothConnectionRequested.)
  6. Does the SF status read require OpenConnection3 to have succeeded, or is Open2 enough? (Trial: try the discovered pull endpoint after Open2 only, before doing OpenConnection3. If it works, the implementation is much simpler.)
  7. What happens when SF is disabled by configuration vs enabled but idle? Both should map to Pending=false, Storing=false, but the underlying server response may be a sentinel error vs an all-zeros struct. The implementation must distinguish "no SF" (return defaults silently) from "SF errored" (return ErrorOccurred = true).

8. Out of Scope

Explicitly not part of this plan:

  • SF write-back (the project mission is read-only; IStorageServiceContract.AddStreamValues etc. stay unimplemented).
  • Setting SF parameters (IStorageServiceContract.SetStoreForwardParameter).
  • Redundant-partner SF aggregation (HistorianStoreForwardStatus.AddPartnerStoreForwardStatus).
  • Reverse-engineering the on-disk SF cache file format beyond presence / file count (Workstream E is a fallback, not a primary deliverable).
  • Anything in the aahClientCommon.CSFConnection.StartStoreforward / SetStorageStopped / SetTagSynchronized write surface.