Plan two reverse-engineering campaigns: write commands + store/forward cache

docs/plans/write-commands-reverse-engineering.md (425 lines):
  Plan for adding WriteValueAsync (AddS2 stream values), EnsureTags2 for
  analog/discrete/string tags, and DelT for sandbox cleanup. Hard safety
  rules center on a dedicated sandbox tag gated by env var, time-bounded
  writes, SQL ground-truth verification per session, explicit rollback.
  Five-step RE workflow mirrors the read/event decode (static IL discovery
  -> instrument-wcf-writemessage capture -> instrument-wcf-readmessage
  capture -> byte/IL alignment -> managed serializer + golden-byte tests).
  Risks call out auth-chain unknowns, parameter-name-mismatch class,
  silent-success failure modes, History-vs-Storage service question.

docs/plans/store-forward-cache-reverse-engineering.md (501 lines):
  Plan for replacing the synthesized GetStoreForwardStatusAsync with a
  real implementation. Architecture investigation already partially
  answered via IL inspection during planning: ArchestrA.HistorianAccess.
  GetStoreForwardStatus (token 0x06006187) reads an in-process C struct
  via calli to mdas_GetStorageStatus, kept current by server-pushed WCF
  callbacks (IStatusServiceContract2.SetStoreForwardEvent). CSFConnection.
  GetSFPipeName indicates a separate Named Pipe sidecar exists when SF
  is configured. Five parallelizable discovery workstreams, six concrete
  RE steps with cited tokens, eight risks, eight success criteria.

Both plans deliberately produce no code changes and no captures. They
exist so the next implementer can start with full context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
dohertj2
2026-05-04 07:16:32 -04:00
parent 5310952ab2
commit 6f01b83313
2 changed files with 926 additions and 0 deletions
@@ -0,0 +1,501 @@
# Store/Forward Cache Reverse-Engineering Plan
Last updated: 2026-05-04
This document plans the reverse-engineering effort needed to replace the
synthesized `GetStoreForwardStatusAsync` in
`src/AVEVA.Historian.Client/Wcf/HistorianWcfStatusClient.cs` (lines 101-117)
with a real, evidence-backed implementation. It is a *plan*, not the work
itself. No code changes; no captures collected.
Read this together with:
- `docs/reverse-engineering/handoff.md` — read/event protocol decoding state
- `src/AVEVA.Historian.Client/Wcf/Contracts/IStorageServiceContract.cs` — the
WCF contract that already declares the SF parameter ops
- `src/AVEVA.Historian.Client/Models/HistorianStoreForwardStatus.cs` — the
output model the implementation must populate
## 1. Goal
"SF support works" means, end-to-end:
1. **Primary deliverable.** `client.GetStoreForwardStatusAsync()` against a
live local Historian returns a `HistorianStoreForwardStatus` whose
`Pending`, `Storing`, `DataStored`, `ErrorOccurred`, `Error`, `ServerName`,
and `ConnectionKind` fields reflect actual server-reported state, not the
synthesized defaults at
`HistorianWcfStatusClient.cs:107-117`.
2. **Secondary deliverable.** The SDK can also answer the higher-level
"is SF currently buffering?" question accurately when the runtime DB is
*down*, not just when it is up. That is the case the real native client
handles correctly and where the synthesized default (`Storing = false`,
`ErrorOccurred = false`) is silently wrong today.
3. **Non-goals.** Writing into SF, replaying SF buffers, configuring SF
parameters, redundant-partner SF aggregation
(`HistorianStoreForwardStatus.AddPartnerStoreForwardStatus`,
token `0x060060B8`). Read-only matches the project mission in
`CLAUDE.md`.
The success bar is parity with the native wrapper's
`ArchestrA.HistorianAccess.GetStoreForwardStatus`
(MD token `0x06006186` in `current/aahClientManaged.dll`),
not a superset.
## 2. Architecture Investigation (open questions, in priority order)
Answer these before writing any production code. Each has a discovery action
in §3.
### Q1. Is SF status read from a local in-process struct, a separate WCF endpoint, or a Named Pipe IPC?
Current evidence: **all three are plausible, but the wrapper actually uses
"in-process struct kept current by server-pushed WCF events"**. Specifically:
- `ArchestrA.HistorianAccess.GetStoreForwardStatus`
(token `0x06006187`, the private 2-arg overload) does *not* call WCF.
It calls `mdas_GetStorageStatus` (a `calli` against the
`INSQL_MDAS_ERROR (IntPtr handle, uint, HISTORIAN_STORAGE_STATUS*)` C
signature in `current/aahClient.dll` exports) and then maps the result
through `HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus`
(token `0x060060E4`).
- Mutators like `CConfigStatusClient.SetMdasStoreForwardEvent`
(token `0x060029DC`) and `aahClientCommon.CStatus.SetStoreForwardEvent`
(token `0x06002A04`) are wired to the WCF callback
`IStatusServiceContract2.SetStoreForwardEvent`
(`StatusServiceContract.IStatusServiceContract2.SetStoreForwardEvent`,
token `0x06005F57`). The server *pushes* SF state changes; the client
caches them.
- Confirm: read the IL of token `0x06006187` and verify the only system call
is `mdas_GetStorageStatus`. The first 200 instructions confirm this:
`GetClient(ConnectionIndex)``calli` against the
`INSQL_MDAS_ERROR(IntPtr,uint,HISTORIAN_STORAGE_STATUS*)` signature →
`ConvertUnmanagedSFStorageStatusToManagedStorageStatus`.
Implication: **the SDK cannot ship a synchronous probe that calls one WCF
operation and gets the answer**. It must subscribe to the same status-event
stream the native wrapper subscribes to, or call a status query that returns
the cached snapshot from the server.
### Q2. Is there a single-shot WCF query that returns the same snapshot?
Likely yes. Hypothesis: `IStatusServiceContract2.GetHistorianInfo`
(`GETHI`, see `IStatusServiceContract2.cs:24-30`) returns a multi-key status
blob whose schema includes SF state. Alternative: a status-only key passed to
`GetSystemParameter` (already plumbed via `HistorianWcfStatusClient.GetSystemParameterAsync`).
Both are testable without writing protocol code by sending probe payloads
and observing the response shape.
### Q3. Does SF have its own sidecar process / pipe / WCF endpoint we are missing?
Strong evidence the answer is yes when SF is *enabled*:
- `aahClientCommon.CSFConnection.GetSFPipeName` (token `0x06004B72`),
`GetSFPath` (`0x06004B71`), `IsConnected` (`0x06004B73`), `IsEnabled`
(`0x06004B6F`) — there is a separately-named SF Named Pipe distinct
from the main MDAS pipe.
- `aahClientCommon.CSFConnection.StartStoreforward` (token `0x06004BC6`).
- `IStorageServiceContract` already declares `GetStoreForwardParameter`
/ `SetStoreForwardParameter` (`GetSFP`/`SetSFP`,
see `IStorageServiceContract.cs:81-85`) and `Storage` is a separate
WCF service slot in `HistorianWcfServiceNames.cs:15`.
- `CWcfConfig.ConfigurePipeProxy<IStorageServiceContract>` (token
`0x06004B1C`) and `CWcfConfig.ConfigureTcpProxy<IStorageServiceContract>`
(token `0x06004B1B`) confirm the storage proxy supports both transports —
same dual-transport pattern the History/Retrieval proxies use.
- `CStorageEngineConsoleClient.GetPipeNameStr` (token `0x06000E2D`) /
`GetFullPipeNameStr` (token `0x06000E2E`) wraps the storage-engine
console pipe via `STransactPipeClient2` (a *non-WCF* binary pipe
protocol).
Open: **is the SF sidecar even running on the dev host this SDK is being
tested against?** `handoff.md` does not record an SF process being
observed. `aveva-install-x64/` and `aveva-install-x86/` ship only DLLs
(no `aahStoreForwardClient.exe` / `aahSFClient.exe` / similar). The SF
sidecar is part of the Historian *server* install, not the client
redistributable. So:
- On the developer machine, SF is reachable only because the local
Historian server is installed.
- A pure-client install (the deployment target this SDK ships into) may
*never* have SF.
This shapes the success criteria: when SF is not configured, a correct
implementation returns `Pending = false`, `ErrorOccurred = false`,
`DataStored = false`, `Storing = false` — i.e. the same shape the
synthesized defaults produce today. The interesting case is *when SF is
configured and active*.
### Q4. Is SF state authoritative on the Historian server or on a per-client basis?
Native wrapper reads it from `HistorianClient*` (the per-connection C++
object). This means it is *connection-scoped* server-pushed state. We
do not need to enumerate cluster-wide SF state — the server reports
"my SF buffer for this client's writes" only. This matches our read-only
mission: we are not a writer, so the only SF state of interest is the
server-side cache for *other* writers, which the server can report to
us as a passive observer.
### Q5. Does any SF probe require Admin?
`CSFConnection.GetSFPipeName` returns a kernel object name. Reading
from it requires the pipe ACL to permit the caller. If the SF pipe is
ACL'd to `LocalSystem` only, the SDK cannot read it without
impersonation — and the SDK runs as the calling process. This is a
hard limit, not a bug.
## 3. Discovery Workstreams
Run these in parallel. None require a live server beyond what the
existing test rig already has.
### Workstream A — Static IL inspection (parallel-safe, read-only)
Owner action items, in order:
1. Dump full IL of token `0x06006187`
(`HistorianAccess.GetStoreForwardStatus(ConnectionIndex,out)`):
```powershell
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- `
dnlib-method current\aahClientManaged.dll HistorianAccess.GetStoreForwardStatus --instructions
```
Save under `docs/reverse-engineering/historianaccess-getstoreforwardstatus-il-latest.txt`.
Confirm the `calli` target signature
`INSQL_MDAS_ERROR(IntPtr,uint,HISTORIAN_STORAGE_STATUS*)` and that
the only WCF entry-points it touches are zero.
2. Dump IL of `HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus`
(token `0x060060E4`). This is the unmanaged→managed mapping; it
tells us which fields of `HISTORIAN_STORAGE_STATUS` populate which
fields of `HistorianStoreForwardStatus`. We will need the same
mapping in reverse on the wire response.
3. Inventory every method that *writes* into the local SF status
struct:
```
methods current\aahClientManaged.dll SetStoreForward
methods current\aahClientManaged.dll SetMdasStoreForward
```
The known set as of writing:
`CConfigStatusClient.SetMdasStoreForwardEvent` (`0x060029DC`),
`aahClientCommon.CStatus.SetStoreForwardEvent` (`0x06002A04`),
`CStatusConnectionDirect.SetStoreForwardEvent` (`0x06004DF8`),
`CStatusConnectionWCF.SetStoreForwardEvent` (`0x06004E4E`),
`CClientCommon.SetStoreForwardEventOnServer` (`0x06002EC0`).
The `WCF` variant is the one whose IL maps onto
`IStatusServiceContract2.SetStoreForwardEvent`
(token `0x06005F57`) — read its IL and document the request/response
shape.
4. Dump IL of `IStatusServiceContract2.SetStoreForwardEvent`
(`0x06005F57`) parameter types. The `[OperationContract]`
declaration in the wrapper assembly already encodes the wire shape;
this gives us the bytes the server pushes us.
### Workstream B — Install inventory (parallel-safe)
1. Inventory `aveva-install-x64\` and `aveva-install-x86\` for any
binary whose name contains `Store`, `Forward`, `SF`, `Cache`,
`Spool`. As of this checkout: **none**, only DLLs. Confirm.
2. Inventory the deployed Historian server (out-of-band; not in this
repo) for `aahStoreForwardClient.exe`,
`aahStoreForwardServer.exe`, `aahSFCache.exe`, or any service
registered with `Description` matching `*Forward*`. Capture the
service name, account identity, and pipe ACLs (`accesschk -wuvc`).
3. Walk the registry: `HKLM\SOFTWARE\ArchestrA\Historian` and any
sub-key matching `*StoreForward*`, recording paths and pipe names.
Sanitize before committing.
### Workstream C — WCF probe (parallel-safe)
Use the existing `wcf-probe` and `wcf-status` subcommands of
`tools\AVEVA.Historian.ReverseEngineering`:
1. `wcf-probe $env:HISTORIAN_HOST 32568` — confirm `Storage/GetV` is
reachable. (It is the third service slot in
`HistorianWcfServiceNames`.) Document the returned interface
version.
2. `wcf-status $env:HISTORIAN_HOST 32568 <param-name>` — sweep
plausible SF parameter names (`SF.Status`, `StoreForward.State`,
`SFCacheBytes`, etc.) through `GetSystemParameter` and record what
the server accepts. Cheap, read-only, no session needed beyond the
already-decoded auth chain.
3. Probe `GetHistorianInfo` (`GETHI`,
`IStatusServiceContract2.cs:24`) with the byte request shape used
by the native wrapper. The request bytes are visible if we run
`instrument-wcf-readquery`-style instrumentation against
`CConfigStatusClient.SetMdasStoreForwardEvent`'s upstream caller —
see Workstream D.
### Workstream D — Native capture (sequential after A and C)
Two captures are needed:
1. **Native call to `mdas_GetStorageStatus`.** Run
`tools\AVEVA.Historian.NativeTraceHarness` with a new scenario
`--scenario sfstatus` (to be added) that invokes
`HistorianAccess.GetStoreForwardStatus()` and dumps the
`HISTORIAN_STORAGE_STATUS` C struct memory before the managed
conversion runs. This pins the binary layout of the struct
(offsets, field widths, endianness) without us guessing.
2. **WCF push of SF events.** Configure the local Historian to enter
SF mode (stop the runtime DB writer; let the writer's queue
trigger SF) and capture the WCF traffic with the existing
`instrument-wcf-readquery` sibling — i.e. add an
`instrument-wcf-setstoreforwardevent` subcommand that
IL-rewrites `aahClientManaged.dll` to log the bytes the server
sends to `IStatusServiceContract2.SetStoreForwardEvent`. Save
the rewrite under `docs/reverse-engineering/dnlib-write-copy/`,
never `current/`.
Workstream D is the only step that needs an actively-storing SF
sidecar. Plan: stop the Historian Runtime DB SQL service, write a
single test point via the wrapper's writer harness, and capture the
SF event push, then restart Runtime DB and capture the
"end-of-SF / data drained" push.
### Workstream E — On-disk cache (only if Workstream D fails)
If the WCF push protocol turns out to be impractical to reproduce
(e.g. requires duplex contract, callback channel, or a server-side
session-bind we cannot match from our managed client), fall back to
inspecting the on-disk SF cache directly. Steps:
1. Resolve `CSFConnection.GetSFPath` IL to find the cache directory
convention (likely `%ProgramData%\ArchestrA\Historian\Cache\` or
similar — to be confirmed, **never assume the path**).
2. Inventory file types: `.sfdata`, `.sfindex`, `.cache` — whatever
the directory contains.
3. Decode the file header. The presence/size of `.sfdata` files is
sufficient to populate `DataStored` and `Pending`; we do not
need to decode the value payload.
This fallback is only for `DataStored` / `Pending`. `Storing` and
`Error` fundamentally require a live server-state read.
## 4. Concrete Reverse-Engineering Steps (execution order)
Mirrors the read/event decoding workflow that succeeded for raw
queries.
### Step 1 — Find native methods that touch SF
Already done; baseline evidence is recorded in §2 Q1/Q3 above. Key
tokens to reference:
- `0x06006186`, `0x06006187` — public/private
`HistorianAccess.GetStoreForwardStatus`
- `0x060060E4`
`HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus`
- `0x060029DC``CConfigStatusClient.SetMdasStoreForwardEvent`
- `0x06002A04``aahClientCommon.CStatus.SetStoreForwardEvent`
- `0x06002DFF``aahClientCommon.CClientCommon.IsInStoreForward`
- `0x06002E18``aahClientCommon.CClientCommon.SetStoreForwardParams`
- `0x06002EC0``CClientCommon.SetStoreForwardEventOnServer`
- `0x06004BC6``aahClientCommon.CSFConnection.StartStoreforward`
- `0x06004B6F`..`0x06004B73` — CSFConnection getters (path, pipe,
enabled, connected)
- `0x06004DF8`, `0x06004E4E` — direct vs WCF status connections
- `0x06005F57``IStatusServiceContract2.SetStoreForwardEvent` MD ref
- `0x06006193``HistorianAccess.IsBothConnectionRequested` (used by
the public arity-0 GetStoreForwardStatus to decide whether to fan
out to a redundant partner)
### Step 2 — Decode `HISTORIAN_STORAGE_STATUS` layout
Run Workstream A.2 (decode token `0x060060E4`) and Workstream D.1
(native struct memory dump). Together they pin the field layout.
The managed struct fields we already know we need to populate
(from `HistorianStoreForwardStatus.cs`):
`ServerName`, `Pending`, `ErrorOccurred`, `Error`, `DataStored`,
`Storing`, `ConnectionKind`. The native struct will have ≥7
fields plus padding. Express the mapping as a comment table in
the implementation.
### Step 3 — Decide the wire model
Two possible implementations:
1. **Push-mode (native parity).** SDK opens an authenticated WCF
session that the server treats as a status subscriber, listens
for `IStatusServiceContract2.SetStoreForwardEvent` callbacks,
maintains a local cache, and `GetStoreForwardStatusAsync`
returns from the cache. This requires WCF duplex
(`CallbackContract`) which is not currently exercised
anywhere in `src/AVEVA.Historian.Client/Wcf/`.
2. **Pull-mode (probe).** SDK calls `GetHistorianInfo` (`GETHI`)
or a discovered `Storage`-service equivalent and maps the
one-shot response. No subscription state required.
Pull-mode is strongly preferred: it matches the SDK's existing
WCF style, avoids duplex contracts, and the existing code path
in `HistorianWcfStatusClient.GetSystemParameter` is the right
shape. Only fall back to push-mode if Workstream C.3 proves the
server has no pull endpoint that returns SF state.
### Step 4 — Implement the managed contract method
Once Step 3 picks pull-mode, implement against the WCF contract
(likely a new `[OperationContract]` on `IStatusServiceContract2`
or a method on `IStorageServiceContract`). Follow the existing
parameter-naming discipline from the resolved
`ValidateClientCredential` blocker:
**use `[MessageParameter(Name = "...")]` to match exact server
element names — do not let WCF derive them from C# parameter
names.** See `handoff.md` "Active Blocker" entry for the
2026-05-04 fix.
### Step 5 — Add golden-byte fixtures
Add a request and response fixture under
`fixtures/protocol/store-forward-status/`:
- `request-get-storage-status.bin` — bytes the SDK sends.
- `response-get-storage-status-running-normal.bin` — server
not in SF.
- `response-get-storage-status-active-sf.bin` — server actively
storing.
- `response-get-storage-status-error.bin` — server's SF errored.
Capture sources: the same instrumented native wrapper runs that
populate Workstream D. Sanitize hostnames, GUIDs, and timestamps
before committing.
### Step 6 — Replace the synthesized stub
Replace `SynthesizeStoreForwardStatus` (lines 107-117 of
`HistorianWcfStatusClient.cs`) with a real implementation. Keep
the synthesized fallback for the case where the storage service
returns a "no SF configured" sentinel — that is *not* an error
condition, it is the normal state for client-only deployments.
Add a unit test class `WcfStoreForwardStatusProtocolTests` next
to the existing `WcfDataQueryProtocolTests` etc., with golden-byte
parse tests using the fixtures from Step 5.
Update the operation status table in `README.md:20` from
"synthesized defaults (no SF sidecar to probe)" to
"live-verified" once the integration test passes.
## 5. Risks and Gotchas
1. **SF may not be present on the test host.** The dev Historian
probably has SF disabled by default; turning it on means
stopping Runtime DB SQL services, which is invasive. Plan to do
capture work on a dedicated sacrificial Historian VM, not the
shared dev box.
2. **SF sidecar may require Admin or LocalSystem to query.** Any
pipe-direct fallback (Workstream E) will fail under standard
user accounts. Document the privilege requirement explicitly
in the SDK XML doc comments on `GetStoreForwardStatusAsync`.
3. **State is volatile.** Probes that take >100 ms can race
against the server's own SF state machine. Capture *both*
request and response in the same instrumented run; do not
try to correlate two captures.
4. **Push-mode would force a duplex WCF contract.** None of the
existing decoded operations use duplex. Adding it widens the
managed WCF surface significantly and risks .NET-WCF
compatibility issues we have not yet hit. Pull-mode first.
5. **The wrapper's `IsBothConnectionRequested` (token `0x06006193`)
path indicates a "primary + partner" topology.** Out of scope
for this pass per §1, but if the server returns partner data
in the same response we must skip-decode (not throw on)
unknown trailing bytes.
6. **`Open2`-only sessions never receive SF events.** `handoff.md`
"Active Blocker" notes the wrapper's full chain
(`OpenConnection3` after the `ValCl` rounds) is the path that
produces a session the server treats as a real client. SF
probes must run from inside that chain — re-using
`HistorianWcfAuthChainHelper.OpenAuthenticatedConnection`,
the same call site already used by `GetSystemParameter` at
`HistorianWcfStatusClient.cs:42`.
7. **`HISTORIAN_STORAGE_STATUS` field order is not contractual.**
The struct is C++ inside the closed source. If AVEVA reorders
fields between Historian versions, our decoder breaks. Pin the
decoder to the Historian server version observed at session
open (already exposed via `IRetrievalServiceContractN`) and
reject mismatched versions explicitly with
`ProtocolEvidenceMissingException`. Do not silently best-effort
parse.
8. **Sanitization.** Pipe names, registry paths, and SF cache
directory paths can leak hostnames and account names. Run the
`rg` sanitizer (handoff.md "Next Pickup Steps") after every
doc edit.
## 6. Success Criteria
A real implementation is "done" when all of the following hold:
1. `client.GetStoreForwardStatusAsync()` returns
`Pending = true` and `Storing = true` while the local
Historian's SF cache is actively buffering writes (verifiable
by stopping the Runtime DB and writing a value).
2. Returns `Pending = false` and `Storing = false` within
≤ 5 seconds after the Runtime DB recovers and SF drains.
3. Returns `ErrorOccurred = true` and a non-null, actionable
`Error` message when the SF cache itself fails (disk full,
pipe closed, etc.).
4. Returns the synthesized "no SF" shape (all-false) without
throwing on a Historian where SF is not configured.
5. Two new golden-byte unit tests pass (active-SF and idle-SF
responses).
6. `ProtocolGuardrailTests` no longer needs to exempt
`GetStoreForwardStatusAsync` from any "must throw
`ProtocolEvidenceMissingException`" rule — the method is now
evidence-backed.
7. Live integration test
`HistorianClientIntegrationTests.GetStoreForwardStatusAsync_ReturnsServerState`
(to be added) passes when `HISTORIAN_HOST` is set, skips
cleanly otherwise.
8. `README.md:20` operation status table is updated from
"synthesized defaults" to "live-verified".
## 7. Open Questions for the Implementer
Resolve these before writing production code:
1. Does the server expose a *pull* endpoint that returns the full
`HISTORIAN_STORAGE_STATUS` snapshot, or only push events?
(Workstream C.3 answers this.)
2. What is the binary layout of `HISTORIAN_STORAGE_STATUS`?
(Workstream A.2 + D.1.)
3. What is the `[OperationContract]` shape on
`IStatusServiceContract2.SetStoreForwardEvent`? Specifically:
parameter count, byte-buffer parameters, and exact
`MessageParameter` names? (Workstream A.4.)
4. Is the `Storage` service slot at
`net.pipe://<host>/Storage` and `net.tcp://<host>:32568/Storage`
reachable on a non-Historian-server install? Or does it 404
when only the client redistributable is present? (Workstream
B + C.1.)
5. Does the SF status snapshot include partner / redundant SF
state inline, or is it returned from a separate call?
(Workstream A.1, look for branches under
`IsBothConnectionRequested`.)
6. Does the SF status read require `OpenConnection3` to have
succeeded, or is `Open2` enough? (Trial: try the discovered
pull endpoint after `Open2` only, before doing
`OpenConnection3`. If it works, the implementation is much
simpler.)
7. What happens when SF is *disabled* by configuration vs
*enabled but idle*? Both should map to `Pending=false,
Storing=false`, but the underlying server response may be a
sentinel error vs an all-zeros struct. The implementation must
distinguish "no SF" (return defaults silently) from "SF errored"
(return `ErrorOccurred = true`).
## 8. Out of Scope
Explicitly not part of this plan:
- SF write-back (the project mission is read-only;
`IStorageServiceContract.AddStreamValues` etc. stay
unimplemented).
- Setting SF parameters
(`IStorageServiceContract.SetStoreForwardParameter`).
- Redundant-partner SF aggregation
(`HistorianStoreForwardStatus.AddPartnerStoreForwardStatus`).
- Reverse-engineering the on-disk SF cache file format beyond
presence / file count (Workstream E is a fallback, not a
primary deliverable).
- Anything in the
`aahClientCommon.CSFConnection.StartStoreforward` /
`SetStorageStopped` / `SetTagSynchronized` write surface.
@@ -0,0 +1,425 @@
# Plan: Reverse-Engineering Write Commands
Status: PLAN ONLY (no implementation yet). Extends the read/event
work in `docs/reverse-engineering/handoff.md` (2026-05-04).
## 1. Goal
"Write commands work" means the production SDK at
`src/AVEVA.Historian.Client/` performs these operations end-to-end
against a live AVEVA Historian, with parsed responses, golden-byte
unit tests, and gated live integration tests.
In scope:
1. **`AddS2` (`IHistoryServiceContract2.AddStreamValues2`)** — push
one or more timestamped samples for an existing historized tag.
Primary use case: an OPC UA driver pushing values to the
Historian.
2. **`EnsT2` (`IHistoryServiceContract2.EnsureTags2`) for
analog/discrete/string data tags** — partially decoded for the
`CM_EVENT` AnE-event tag in
`src/AVEVA.Historian.Client/Wcf/HistorianAddTagsProtocol.cs`. The
`CTagMetadata` byte layout for `CDataType` ∈ {1, 2, 3, 4} is the
new evidence target.
3. **`DelT` (`IHistoryServiceContract2.DeleteTags`)** — needed for
safe sandbox cleanup during RE.
4. **`ModifyData` / `DeleteData`** — only if §3.4 method discovery
confirms a managed WCF op exists.
Out of scope: tag-extended-properties (`AddTEx` / `DelTep`),
`ExKey`, `SetSFP`, snapshot send (`SendSnapshotBegin/End/Snapshot`),
tag-id-pair maintenance, shard splits, flush ops, all
`IStorageServiceContract` writes (engine-internal — see §6.d), event
writes (events come from AVEVA AnE, we only read them), schema
changes (forbidden over the wire).
## 2. Safety Constraints
The Runtime DB is production data even on `localhost`. `AddS2`
writes are persistent — they go to compressed history blocks and
cannot be removed through any client-facing surface.
Hard rules:
1. **Single dedicated sandbox tag.** Add env var
`HISTORIAN_WRITE_SANDBOX_TAG = "RetestSdkWriteSandbox"`. Live
write tests refuse to run when unset, even when other
`HISTORIAN_*` vars are set.
2. **Never write to** any tag named in `HISTORIAN_TEST_TAG`,
`HISTORIAN_TAG_FILTER`, the docs, the test fixtures, or the
captured RE ndjson. The read fixture
`OtOpcUaParityTest_001.Counter` is OFF-LIMITS for writes.
3. **Documented rollback.** Every write session records its time
window to
`artifacts/reverse-engineering/write-sandbox-window-<stamp>.json`
so SQL `SELECT * FROM History WHERE wwTagKey = ? AND DateTime
BETWEEN @s AND @e` can identify exactly which rows the session
inserted. Tag rollback is via decoded `DelT` (§3.3) once
available, or manually via System Management Console until then.
4. **Time bounds on writes.** Every `AddS2` test uses
`DateTime.UtcNow` ± a small offset, so writes always land inside
the live `RealTimeWindow` / `FutureTimeThreshold` system
parameters and cannot accidentally overwrite older blocks.
5. **No customer / corporate hosts.** `localhost` only.
6. **Sanitization scan after every session:**
`rg -n "(?i)(password|credential|secret|token|<known-sensitive-host>|<known-sensitive-machine>|<known-sensitive-user>)" docs\reverse-engineering scripts tools docs\plans`.
Soft rules:
- Use a separate captures dir
(`artifacts/reverse-engineering/instrumented-wcf-writemessage-writes/`)
so write captures don't contaminate the existing read/event
ndjson.
- New integration tests follow the existing gating pattern in
`tests/AVEVA.Historian.Client.Tests/HistorianClientIntegrationTests.cs`
(`Skip = ...` when env var unset).
## 3. Discovery Workstreams
### 3.1 EnsT2 for analog/discrete/string tags (priority 1)
- WCF op: `aa/Hist/EnsT2`.
- Contract:
`src/AVEVA.Historian.Client/Wcf/Contracts/IHistoryServiceContract2.cs:82-89`,
already declared with `[MessageParameter(Name = "InBuff" / "OutBuff")]`.
- Existing code: `HistorianAddTagsProtocol.SerializeCmEventCTagMetadata`
builds the `CDataType=5` (event) shape.
- Missing: the `CTagMetadata` byte layout for `CDataType ∈ {1, 2,
3, 4}` (analog double, discrete, string, analog int per the
type-code table in `data-query-request-ctor-il-latest.txt`);
whether the optional-mask `0x0086` and the 5-byte trailer
`2F 27 01 01 01` change per type; analog engineering-units / range
/ deadband fields (likely populate the bytes that are zero in the
event-tag fixture).
### 3.2 AddS2 stream values (priority 1)
- WCF op: `aa/Hist/AddS2`.
- Contract:
`src/AVEVA.Historian.Client/Wcf/Contracts/IHistoryServiceContract2.cs:75-80`,
already has `[MessageParameter(Name = "pBuf")]`. **Audit
requirement:** verify against `ildasm aahClientAccessPoint.exe`
that `Handle` and `errorBuffer` parameter names also match — the
handoff's parameter-name-mismatch class has bitten ~30 ops.
- Missing: entire `pBuf` byte layout (likely `UInt16 version + UInt32
sampleCount + N × {tagId GUID, FILETIME, qualityByte, value typed
by CDataType}`); whether `Handle` is the same Open2 v6 session GUID
as `UpdC3`/`RTag2`/`EnsT2`; the auth-chain prereqs (event flow
needed Stat priming + Trx/Stat/Retr `GetV` between RTag2 and EnsT2;
writes may have a different chain); success vs error response
shape.
### 3.3 DelT tag deletion (priority 2 — needed for safe RE)
- WCF op: `aa/Hist/DelT`.
- Contract:
`src/AVEVA.Historian.Client/Wcf/Contracts/IHistoryServiceContract2.cs:21-30`.
- Missing: `tagNames` byte layout (likely length-prefixed
compact-ASCII per the handoff convention); whether server refuses
to delete tags with stored history or cascades; whether `DelT` is
sufficient to fully unregister or leaves orphan rows in
`Runtime.dbo.Tag`.
### 3.4 ModifyData / DeleteData (priority 3 — exists?)
No corresponding WCF op is currently declared. **First step:** static
inspection to confirm any managed wrapper exists.
```powershell
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll EditValue
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll ModifyValue
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll EditData
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll DeleteData
```
If no managed wrapper exists, this op is REST-only / SMC-only —
mark as **out of scope** in this doc. Otherwise decode like
§3.1/§3.2.
Parallelism: 3.1 and 3.3 can be developed in parallel because the
operator can create the sandbox tag manually via SMC while SDK code
is being written. 3.2 cannot meaningfully proceed until 3.1 (or the
manual tag) exists. 3.4 method discovery is cheap and may eliminate
its own scope.
## 4. RE Steps in Execution Order
For each workstream above, run these five steps. Mirrors the read
+ event flows that recovered the existing protocol.
### 4.a Static method discovery
Find the native serializer:
```powershell
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll AddS
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll EnsureTag
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll DeleteTag
```
Dump IL for each method of interest:
```powershell
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- dnlib-method --instructions current\aahClientManaged.dll <Type::Method>
```
Save sanitized excerpts to
`docs/reverse-engineering/dnlib-<op>-il-latest.txt`.
### 4.b Wire-byte capture for the request
Same IL-rewrite tooling that captured the 27 outgoing event calls:
```powershell
$captureDir = "artifacts\reverse-engineering\instrumented-wcf-writemessage-writes"
New-Item -ItemType Directory -Force -Path $captureDir | Out-Null
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- instrument-wcf-writemessage current\aahClientManaged.dll "$captureDir\aahClientManaged.dll"
Copy-Item -Force "$captureDir\aahClientManaged.dll" "$captureDir\current-copy\aahClientManaged.dll"
$env:AVEVA_HISTORIAN_RE_CAPTURE = (Resolve-Path $captureDir).Path + "\writemessage-capture-write-latest.ndjson"
```
A new harness scenario `--scenario write` needs to be added to
`tools/AVEVA.Historian.NativeTraceHarness` to drive the native
wrapper's `AddStreamValues2` against the sandbox tag. Suggested
new args: `--write-sandbox-tag`, `--write-value`.
### 4.c Wire-byte capture for the response
Symmetric `instrument-wcf-readmessage`:
```powershell
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- instrument-wcf-readmessage current\aahClientManaged.dll "$captureDir\aahClientManaged.dll"
```
The success response for `AddS2` is just `<AddS2Result>true</…>` +
empty `errorBuffer`. **Capture at least one negative case** (write
to non-existent tag, or write with malformed CDataType) so the
orchestrator can surface diagnostics like
`HistorianWcfEventOrchestrator.LastErrorBufferDescription`.
### 4.d Decode against IL
Strip SOAP/MDAS envelope; align byte offsets against the native
serializer IL from 4.a (the `ldc.i4 / call WriteByte` sequence
makes field order and constants explicit); cross-reference the
`CDataType` table from `data-query-request-ctor-il-latest.txt` to
interpret typed value bytes; write a parser-and-builder pair and
verify against the captured bytes before committing.
### 4.e Implement managed serializer + tests
New code under `src/AVEVA.Historian.Client/Wcf/`:
- `HistorianAddStreamValuesProtocol.cs` — `Serialize(...)` returns
`byte[] pBuf`, mirroring `HistorianAddTagsProtocol`.
- Extend (or split) `HistorianAddTagsProtocol` for the analog /
discrete / string `EnsT2` shapes.
- `HistorianWcfWriteOrchestrator.cs` — chains `Hist.GetV →
Hist.ValCl × 2 → Hist.Open2 → UpdC3 → priming chain (TBD per
§3.2) → AddS2 loop → Close2`.
Public surface on `HistorianClient`:
- `WriteValueAsync(tag, value, timestampUtc, quality)`
- `WriteValuesAsync(IReadOnlyList<HistorianSampleWrite>)`
- `EnsureTagAsync(HistorianTagDefinition)`
- `DeleteTagAsync(string tagName)`
Until evidence supports each path, throw
`ProtocolEvidenceMissingException` (mirrors the existing read
guardrail).
Unit tests under `tests/AVEVA.Historian.Client.Tests/Wcf/`:
- `WcfAddStreamValuesProtocolTests` — golden-byte tests for one
analog, one discrete, one string write.
- `WcfEnsureTagsProtocolTests` — golden-byte tests for the
analog/discrete/string `CTagMetadata` shapes.
- Extend `ProtocolGuardrailTests` so any not-yet-implemented write
path still throws `ProtocolEvidenceMissingException`.
Live integration tests in `HistorianClientIntegrationTests.cs`,
gated on `HISTORIAN_WRITE_SANDBOX_TAG`:
`WriteValueAsync_WithinDocumentedWindow_PersistsToHistorianDb`
writes a unique value, reads it back via `ReadRawAsync`, and
verifies via direct `sqlcmd` to the History extension table.
## 5. Order of Operations
```
3.4 method discovery (cheap; may eliminate scope)
3.1 EnsT2 (analog/discrete/string) ──► sandbox tag exists
├─────────────────────────────┐
▼ ▼
3.2 AddS2 (priority 1) 3.3 DelT (sandbox cleanup)
3.4 ModifyData/DeleteData (only if 3.4 confirmed scope)
public surface, golden-byte tests, integration tests
```
3.2 is the headline win and depends only on 3.1 (or a manually
created sandbox tag). 3.3 must land before any commit that
programmatically creates new tags; until then, manual SMC deletion
is the documented rollback.
## 6. Risks and Mitigations
### 6.a Auth chain may differ for writes
Reads use `Hist.Open2(ConnectionMode = 0x402)`. Events use the same
`0x402` plus a Stat-priming chain. Writes may need a different
mode (the handoff notes `0x501` was an unverified guess for
events; writes may legitimately need `0x401` or another value).
Mitigation: capture the *full* WriteMessage sequence for a native
write session (not just `AddS2`) to see what `Open2` payload and
priming calls the native wrapper sends.
### 6.b Server-side session-table requirement
Writes may require `RTag2` after `EnsT2` and before `AddS2` (the
event flow needs `RTag2(CmEventTagId)`). The "tag identifier" the
server returns from `EnsT2` may differ from the GUID the client
seeded.
Mitigation: capture the analog `EnsT2` `OutBuff` (event flow's was
a 45-byte echo) and verify whether subsequent `AddS2` payloads
reference the client-seeded GUID, the server-returned GUID, or a
numeric `wwTagKey`. SQL ground truth: `SELECT TagName, wwTagKey
FROM Tag WHERE TagName = '...'`.
### 6.c Silent-success failure mode
`AddS2` may return `true` but no row appears in the History
extension table — the engine silently drops samples outside the
`FutureTimeThreshold` / `RealTimeWindow` system parameters (which
the event flow now reads).
Mitigation: always write at `DateTime.UtcNow`; cross-check with
SQL after every test:
```sql
SELECT TOP 5 DateTime, Value, QualityDetail
FROM History
WHERE wwTagKey = (SELECT wwTagKey FROM Tag WHERE TagName = @sandbox)
AND DateTime BETWEEN @windowStart AND @windowEnd
ORDER BY DateTime DESC;
```
Surface `FutureTimeThreshold` / `RealTimeWindow` via existing
`GetSystemParameterAsync` so failures are diagnosable.
### 6.d Storage service vs History service
`IStorageServiceContract` also exposes `AddT/AddS/AddS2/DelT`. The
working hypothesis is that `/Hist` is client-facing and `/Stor` is
engine-internal, but it's not yet verified.
Mitigation: the WriteMessage capture (§4.b) shows the actual
service path on the wire. If it goes to `/Stor`, update the
orchestrator. Do NOT preemptively implement against both.
### 6.e Parameter-name mismatches
Handoff already flagged `EnsT`, `EnsT2`, `RTag2`, `ExKey`, `StJb`,
`GtJb` for the same `inBuff`/`inputBuffer` mismatch class that
broke reads for weeks. Until each is audited against the server
contract, requests bind to null and the server NREs.
Mitigation: before the first write WriteMessage capture, run an
`ildasm` audit against `aahClientAccessPoint.exe` for the exact
parameter names of `EnsT2`, `AddS2`, and `DelT`, and reconcile
against the existing `[MessageParameter]` attributes.
### 6.f Customer-data exposure in capture files
Write captures contain the sandbox tag name and any value the test
wrote. Not secrets, but noise.
Mitigation: keep all
`instrumented-wcf-writemessage-writes/` artifacts under
`artifacts/` (already gitignored). Sanitize tag names to
`<sandbox-tag>` before committing decoded bytes into
`docs/reverse-engineering/`.
## 7. Success Criteria
Per op:
- **`EnsT2(analog)`**: `EnsureTagAsync(new HistorianTagDefinition {
Name = sandbox, DataType = Analog })` returns success;
`sqlcmd -E -S . -d Runtime -Q "SELECT TagName FROM Tag WHERE
TagName = '...'"` returns one row.
- **`EnsT2(discrete, string)`**: same shape with corresponding
`DataType`; SQL check uses `DiscreteTag` / `StringTag` view.
- **`AddS2`**: `WriteValueAsync(sandbox, 42.0, DateTime.UtcNow)`
returns success; `ReadRawAsync` returns the value;
`SELECT TOP 1 Value FROM History WHERE wwTagKey = ? AND DateTime
BETWEEN ? AND ?` returns the same value.
- **`DelT`**: `DeleteTagAsync(sandbox)` returns success and SQL
returns zero rows from `Tag`.
- **`ModifyData` / `DeleteData`**: deferred until §3.4 method
discovery confirms scope.
Cross-cutting:
- All new code in `src/AVEVA.Historian.Client/` is pure managed
.NET 10. No new P/Invoke beyond the existing `HistorianSspiClient`.
- Every new op has a golden-byte unit test.
- `dotnet test .\Histsdk.slnx --no-build --logger
"console;verbosity=minimal"` passes 100%.
- With `HISTORIAN_HOST=localhost`,
`HISTORIAN_WRITE_SANDBOX_TAG=RetestSdkWriteSandbox` set, write
integration tests pass and leave zero residue (test `Dispose`
calls `DelT` for cleanup).
- Sanitization scan returns no real secrets.
- `CLAUDE.md` "Required SDK Surface" updated to add the new write
ops — this is a SCOPE CHANGE that must land *alongside* the
evidence, not before. Do not update the SDK surface doc until
3.1 + 3.2 are at least live-test-green.
## 8. Open Questions
1. Does `AddS2` go through `/Hist` or `/Stor` on the wire?
2. Does the sandbox tag need pre-configuration via System
Management Console once before `EnsT2` will accept it from a
client (e.g. for `Storage` / `wwDomain` rows the wire protocol
may not be able to populate)?
3. What `ConnectionMode` does the native wrapper use for write
sessions — `0x402` (read mode reused), `0x401`, or something
else?
4. Does `EnsT2(analog)` require any optional Archestra
engineering-units fields, or are they purely cosmetic? Affects
how minimal `HistorianTagDefinition` can be.
5. Server-side throttles on writes (max samples per AddS2, max
calls per second) — need to surface as batching guidance?
6. What does the server return when `AddS2` is called with a
timestamp older than the tag's earliest stored block? Some
historians silently drop, some error, some accept-and-overwrite.
7. Does the SDK expose write quality as the same
`HistorianSample.Quality` enum used on reads, or a smaller
subset (good/bad)?
8. Is there a managed-side `DelT` path at all? If
`aahClientManaged` only exposes deletion via SMC, §3.3 is
"manual SMC only" and must be documented as such.
## 9. Docs To Update Once Each Workstream Lands
- `CLAUDE.md` "Required SDK Surface" — add `WriteValueAsync`,
`EnsureTagAsync`, `DeleteTagAsync` once 3.1+3.2+3.3 land.
- `AGENTS.md` "Required SDK Surface" — same; update the "alarm-event
write path is dormant" note.
- `docs/reverse-engineering/handoff.md` — add a "Write-flow prereqs"
section symmetric to the existing "Event-flow prereqs".
- `docs/reverse-engineering/wcf-contract-evidence.md` — add evidence
rows for `EnsT2(analog/discrete/string)`, `AddS2`, `DelT`.
- `docs/reverse-engineering/implementation-status.md` — flip
status from "out of scope" to "implemented".
- `README.md` — operation status table.