From 6f01b83313356686033d8c7982109ae4041c6544 Mon Sep 17 00:00:00 2001 From: dohertj2 Date: Mon, 4 May 2026 07:16:32 -0400 Subject: [PATCH] Plan two reverse-engineering campaigns: write commands + store/forward cache docs/plans/write-commands-reverse-engineering.md (425 lines): Plan for adding WriteValueAsync (AddS2 stream values), EnsureTags2 for analog/discrete/string tags, and DelT for sandbox cleanup. Hard safety rules center on a dedicated sandbox tag gated by env var, time-bounded writes, SQL ground-truth verification per session, explicit rollback. Five-step RE workflow mirrors the read/event decode (static IL discovery -> instrument-wcf-writemessage capture -> instrument-wcf-readmessage capture -> byte/IL alignment -> managed serializer + golden-byte tests). Risks call out auth-chain unknowns, parameter-name-mismatch class, silent-success failure modes, History-vs-Storage service question. docs/plans/store-forward-cache-reverse-engineering.md (501 lines): Plan for replacing the synthesized GetStoreForwardStatusAsync with a real implementation. Architecture investigation already partially answered via IL inspection during planning: ArchestrA.HistorianAccess. GetStoreForwardStatus (token 0x06006187) reads an in-process C struct via calli to mdas_GetStorageStatus, kept current by server-pushed WCF callbacks (IStatusServiceContract2.SetStoreForwardEvent). CSFConnection. GetSFPipeName indicates a separate Named Pipe sidecar exists when SF is configured. Five parallelizable discovery workstreams, six concrete RE steps with cited tokens, eight risks, eight success criteria. Both plans deliberately produce no code changes and no captures. They exist so the next implementer can start with full context. Co-Authored-By: Claude Opus 4.7 (1M context) --- ...store-forward-cache-reverse-engineering.md | 501 ++++++++++++++++++ .../write-commands-reverse-engineering.md | 425 +++++++++++++++ 2 files changed, 926 insertions(+) create mode 100644 docs/plans/store-forward-cache-reverse-engineering.md create mode 100644 docs/plans/write-commands-reverse-engineering.md diff --git a/docs/plans/store-forward-cache-reverse-engineering.md b/docs/plans/store-forward-cache-reverse-engineering.md new file mode 100644 index 0000000..8172570 --- /dev/null +++ b/docs/plans/store-forward-cache-reverse-engineering.md @@ -0,0 +1,501 @@ +# Store/Forward Cache Reverse-Engineering Plan + +Last updated: 2026-05-04 + +This document plans the reverse-engineering effort needed to replace the +synthesized `GetStoreForwardStatusAsync` in +`src/AVEVA.Historian.Client/Wcf/HistorianWcfStatusClient.cs` (lines 101-117) +with a real, evidence-backed implementation. It is a *plan*, not the work +itself. No code changes; no captures collected. + +Read this together with: + +- `docs/reverse-engineering/handoff.md` — read/event protocol decoding state +- `src/AVEVA.Historian.Client/Wcf/Contracts/IStorageServiceContract.cs` — the + WCF contract that already declares the SF parameter ops +- `src/AVEVA.Historian.Client/Models/HistorianStoreForwardStatus.cs` — the + output model the implementation must populate + +## 1. Goal + +"SF support works" means, end-to-end: + +1. **Primary deliverable.** `client.GetStoreForwardStatusAsync()` against a + live local Historian returns a `HistorianStoreForwardStatus` whose + `Pending`, `Storing`, `DataStored`, `ErrorOccurred`, `Error`, `ServerName`, + and `ConnectionKind` fields reflect actual server-reported state, not the + synthesized defaults at + `HistorianWcfStatusClient.cs:107-117`. +2. **Secondary deliverable.** The SDK can also answer the higher-level + "is SF currently buffering?" question accurately when the runtime DB is + *down*, not just when it is up. That is the case the real native client + handles correctly and where the synthesized default (`Storing = false`, + `ErrorOccurred = false`) is silently wrong today. +3. **Non-goals.** Writing into SF, replaying SF buffers, configuring SF + parameters, redundant-partner SF aggregation + (`HistorianStoreForwardStatus.AddPartnerStoreForwardStatus`, + token `0x060060B8`). Read-only matches the project mission in + `CLAUDE.md`. + +The success bar is parity with the native wrapper's +`ArchestrA.HistorianAccess.GetStoreForwardStatus` +(MD token `0x06006186` in `current/aahClientManaged.dll`), +not a superset. + +## 2. Architecture Investigation (open questions, in priority order) + +Answer these before writing any production code. Each has a discovery action +in §3. + +### Q1. Is SF status read from a local in-process struct, a separate WCF endpoint, or a Named Pipe IPC? + +Current evidence: **all three are plausible, but the wrapper actually uses +"in-process struct kept current by server-pushed WCF events"**. Specifically: + +- `ArchestrA.HistorianAccess.GetStoreForwardStatus` + (token `0x06006187`, the private 2-arg overload) does *not* call WCF. + It calls `mdas_GetStorageStatus` (a `calli` against the + `INSQL_MDAS_ERROR (IntPtr handle, uint, HISTORIAN_STORAGE_STATUS*)` C + signature in `current/aahClient.dll` exports) and then maps the result + through `HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus` + (token `0x060060E4`). +- Mutators like `CConfigStatusClient.SetMdasStoreForwardEvent` + (token `0x060029DC`) and `aahClientCommon.CStatus.SetStoreForwardEvent` + (token `0x06002A04`) are wired to the WCF callback + `IStatusServiceContract2.SetStoreForwardEvent` + (`StatusServiceContract.IStatusServiceContract2.SetStoreForwardEvent`, + token `0x06005F57`). The server *pushes* SF state changes; the client + caches them. +- Confirm: read the IL of token `0x06006187` and verify the only system call + is `mdas_GetStorageStatus`. The first 200 instructions confirm this: + `GetClient(ConnectionIndex)` → `calli` against the + `INSQL_MDAS_ERROR(IntPtr,uint,HISTORIAN_STORAGE_STATUS*)` signature → + `ConvertUnmanagedSFStorageStatusToManagedStorageStatus`. + +Implication: **the SDK cannot ship a synchronous probe that calls one WCF +operation and gets the answer**. It must subscribe to the same status-event +stream the native wrapper subscribes to, or call a status query that returns +the cached snapshot from the server. + +### Q2. Is there a single-shot WCF query that returns the same snapshot? + +Likely yes. Hypothesis: `IStatusServiceContract2.GetHistorianInfo` +(`GETHI`, see `IStatusServiceContract2.cs:24-30`) returns a multi-key status +blob whose schema includes SF state. Alternative: a status-only key passed to +`GetSystemParameter` (already plumbed via `HistorianWcfStatusClient.GetSystemParameterAsync`). +Both are testable without writing protocol code by sending probe payloads +and observing the response shape. + +### Q3. Does SF have its own sidecar process / pipe / WCF endpoint we are missing? + +Strong evidence the answer is yes when SF is *enabled*: + +- `aahClientCommon.CSFConnection.GetSFPipeName` (token `0x06004B72`), + `GetSFPath` (`0x06004B71`), `IsConnected` (`0x06004B73`), `IsEnabled` + (`0x06004B6F`) — there is a separately-named SF Named Pipe distinct + from the main MDAS pipe. +- `aahClientCommon.CSFConnection.StartStoreforward` (token `0x06004BC6`). +- `IStorageServiceContract` already declares `GetStoreForwardParameter` + / `SetStoreForwardParameter` (`GetSFP`/`SetSFP`, + see `IStorageServiceContract.cs:81-85`) and `Storage` is a separate + WCF service slot in `HistorianWcfServiceNames.cs:15`. +- `CWcfConfig.ConfigurePipeProxy` (token + `0x06004B1C`) and `CWcfConfig.ConfigureTcpProxy` + (token `0x06004B1B`) confirm the storage proxy supports both transports — + same dual-transport pattern the History/Retrieval proxies use. +- `CStorageEngineConsoleClient.GetPipeNameStr` (token `0x06000E2D`) / + `GetFullPipeNameStr` (token `0x06000E2E`) wraps the storage-engine + console pipe via `STransactPipeClient2` (a *non-WCF* binary pipe + protocol). + +Open: **is the SF sidecar even running on the dev host this SDK is being +tested against?** `handoff.md` does not record an SF process being +observed. `aveva-install-x64/` and `aveva-install-x86/` ship only DLLs +(no `aahStoreForwardClient.exe` / `aahSFClient.exe` / similar). The SF +sidecar is part of the Historian *server* install, not the client +redistributable. So: + +- On the developer machine, SF is reachable only because the local + Historian server is installed. +- A pure-client install (the deployment target this SDK ships into) may + *never* have SF. + +This shapes the success criteria: when SF is not configured, a correct +implementation returns `Pending = false`, `ErrorOccurred = false`, +`DataStored = false`, `Storing = false` — i.e. the same shape the +synthesized defaults produce today. The interesting case is *when SF is +configured and active*. + +### Q4. Is SF state authoritative on the Historian server or on a per-client basis? + +Native wrapper reads it from `HistorianClient*` (the per-connection C++ +object). This means it is *connection-scoped* server-pushed state. We +do not need to enumerate cluster-wide SF state — the server reports +"my SF buffer for this client's writes" only. This matches our read-only +mission: we are not a writer, so the only SF state of interest is the +server-side cache for *other* writers, which the server can report to +us as a passive observer. + +### Q5. Does any SF probe require Admin? + +`CSFConnection.GetSFPipeName` returns a kernel object name. Reading +from it requires the pipe ACL to permit the caller. If the SF pipe is +ACL'd to `LocalSystem` only, the SDK cannot read it without +impersonation — and the SDK runs as the calling process. This is a +hard limit, not a bug. + +## 3. Discovery Workstreams + +Run these in parallel. None require a live server beyond what the +existing test rig already has. + +### Workstream A — Static IL inspection (parallel-safe, read-only) + +Owner action items, in order: + +1. Dump full IL of token `0x06006187` + (`HistorianAccess.GetStoreForwardStatus(ConnectionIndex,out)`): + ```powershell + dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- ` + dnlib-method current\aahClientManaged.dll HistorianAccess.GetStoreForwardStatus --instructions + ``` + Save under `docs/reverse-engineering/historianaccess-getstoreforwardstatus-il-latest.txt`. + Confirm the `calli` target signature + `INSQL_MDAS_ERROR(IntPtr,uint,HISTORIAN_STORAGE_STATUS*)` and that + the only WCF entry-points it touches are zero. +2. Dump IL of `HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus` + (token `0x060060E4`). This is the unmanaged→managed mapping; it + tells us which fields of `HISTORIAN_STORAGE_STATUS` populate which + fields of `HistorianStoreForwardStatus`. We will need the same + mapping in reverse on the wire response. +3. Inventory every method that *writes* into the local SF status + struct: + ``` + methods current\aahClientManaged.dll SetStoreForward + methods current\aahClientManaged.dll SetMdasStoreForward + ``` + The known set as of writing: + `CConfigStatusClient.SetMdasStoreForwardEvent` (`0x060029DC`), + `aahClientCommon.CStatus.SetStoreForwardEvent` (`0x06002A04`), + `CStatusConnectionDirect.SetStoreForwardEvent` (`0x06004DF8`), + `CStatusConnectionWCF.SetStoreForwardEvent` (`0x06004E4E`), + `CClientCommon.SetStoreForwardEventOnServer` (`0x06002EC0`). + The `WCF` variant is the one whose IL maps onto + `IStatusServiceContract2.SetStoreForwardEvent` + (token `0x06005F57`) — read its IL and document the request/response + shape. +4. Dump IL of `IStatusServiceContract2.SetStoreForwardEvent` + (`0x06005F57`) parameter types. The `[OperationContract]` + declaration in the wrapper assembly already encodes the wire shape; + this gives us the bytes the server pushes us. + +### Workstream B — Install inventory (parallel-safe) + +1. Inventory `aveva-install-x64\` and `aveva-install-x86\` for any + binary whose name contains `Store`, `Forward`, `SF`, `Cache`, + `Spool`. As of this checkout: **none**, only DLLs. Confirm. +2. Inventory the deployed Historian server (out-of-band; not in this + repo) for `aahStoreForwardClient.exe`, + `aahStoreForwardServer.exe`, `aahSFCache.exe`, or any service + registered with `Description` matching `*Forward*`. Capture the + service name, account identity, and pipe ACLs (`accesschk -wuvc`). +3. Walk the registry: `HKLM\SOFTWARE\ArchestrA\Historian` and any + sub-key matching `*StoreForward*`, recording paths and pipe names. + Sanitize before committing. + +### Workstream C — WCF probe (parallel-safe) + +Use the existing `wcf-probe` and `wcf-status` subcommands of +`tools\AVEVA.Historian.ReverseEngineering`: + +1. `wcf-probe $env:HISTORIAN_HOST 32568` — confirm `Storage/GetV` is + reachable. (It is the third service slot in + `HistorianWcfServiceNames`.) Document the returned interface + version. +2. `wcf-status $env:HISTORIAN_HOST 32568 ` — sweep + plausible SF parameter names (`SF.Status`, `StoreForward.State`, + `SFCacheBytes`, etc.) through `GetSystemParameter` and record what + the server accepts. Cheap, read-only, no session needed beyond the + already-decoded auth chain. +3. Probe `GetHistorianInfo` (`GETHI`, + `IStatusServiceContract2.cs:24`) with the byte request shape used + by the native wrapper. The request bytes are visible if we run + `instrument-wcf-readquery`-style instrumentation against + `CConfigStatusClient.SetMdasStoreForwardEvent`'s upstream caller — + see Workstream D. + +### Workstream D — Native capture (sequential after A and C) + +Two captures are needed: + +1. **Native call to `mdas_GetStorageStatus`.** Run + `tools\AVEVA.Historian.NativeTraceHarness` with a new scenario + `--scenario sfstatus` (to be added) that invokes + `HistorianAccess.GetStoreForwardStatus()` and dumps the + `HISTORIAN_STORAGE_STATUS` C struct memory before the managed + conversion runs. This pins the binary layout of the struct + (offsets, field widths, endianness) without us guessing. +2. **WCF push of SF events.** Configure the local Historian to enter + SF mode (stop the runtime DB writer; let the writer's queue + trigger SF) and capture the WCF traffic with the existing + `instrument-wcf-readquery` sibling — i.e. add an + `instrument-wcf-setstoreforwardevent` subcommand that + IL-rewrites `aahClientManaged.dll` to log the bytes the server + sends to `IStatusServiceContract2.SetStoreForwardEvent`. Save + the rewrite under `docs/reverse-engineering/dnlib-write-copy/`, + never `current/`. + +Workstream D is the only step that needs an actively-storing SF +sidecar. Plan: stop the Historian Runtime DB SQL service, write a +single test point via the wrapper's writer harness, and capture the +SF event push, then restart Runtime DB and capture the +"end-of-SF / data drained" push. + +### Workstream E — On-disk cache (only if Workstream D fails) + +If the WCF push protocol turns out to be impractical to reproduce +(e.g. requires duplex contract, callback channel, or a server-side +session-bind we cannot match from our managed client), fall back to +inspecting the on-disk SF cache directly. Steps: + +1. Resolve `CSFConnection.GetSFPath` IL to find the cache directory + convention (likely `%ProgramData%\ArchestrA\Historian\Cache\` or + similar — to be confirmed, **never assume the path**). +2. Inventory file types: `.sfdata`, `.sfindex`, `.cache` — whatever + the directory contains. +3. Decode the file header. The presence/size of `.sfdata` files is + sufficient to populate `DataStored` and `Pending`; we do not + need to decode the value payload. + +This fallback is only for `DataStored` / `Pending`. `Storing` and +`Error` fundamentally require a live server-state read. + +## 4. Concrete Reverse-Engineering Steps (execution order) + +Mirrors the read/event decoding workflow that succeeded for raw +queries. + +### Step 1 — Find native methods that touch SF + +Already done; baseline evidence is recorded in §2 Q1/Q3 above. Key +tokens to reference: + +- `0x06006186`, `0x06006187` — public/private + `HistorianAccess.GetStoreForwardStatus` +- `0x060060E4` — + `HistorianAccessUtil.ConvertUnmanagedSFStorageStatusToManagedStorageStatus` +- `0x060029DC` — `CConfigStatusClient.SetMdasStoreForwardEvent` +- `0x06002A04` — `aahClientCommon.CStatus.SetStoreForwardEvent` +- `0x06002DFF` — `aahClientCommon.CClientCommon.IsInStoreForward` +- `0x06002E18` — `aahClientCommon.CClientCommon.SetStoreForwardParams` +- `0x06002EC0` — `CClientCommon.SetStoreForwardEventOnServer` +- `0x06004BC6` — `aahClientCommon.CSFConnection.StartStoreforward` +- `0x06004B6F`..`0x06004B73` — CSFConnection getters (path, pipe, + enabled, connected) +- `0x06004DF8`, `0x06004E4E` — direct vs WCF status connections +- `0x06005F57` — `IStatusServiceContract2.SetStoreForwardEvent` MD ref +- `0x06006193` — `HistorianAccess.IsBothConnectionRequested` (used by + the public arity-0 GetStoreForwardStatus to decide whether to fan + out to a redundant partner) + +### Step 2 — Decode `HISTORIAN_STORAGE_STATUS` layout + +Run Workstream A.2 (decode token `0x060060E4`) and Workstream D.1 +(native struct memory dump). Together they pin the field layout. + +The managed struct fields we already know we need to populate +(from `HistorianStoreForwardStatus.cs`): +`ServerName`, `Pending`, `ErrorOccurred`, `Error`, `DataStored`, +`Storing`, `ConnectionKind`. The native struct will have ≥7 +fields plus padding. Express the mapping as a comment table in +the implementation. + +### Step 3 — Decide the wire model + +Two possible implementations: + +1. **Push-mode (native parity).** SDK opens an authenticated WCF + session that the server treats as a status subscriber, listens + for `IStatusServiceContract2.SetStoreForwardEvent` callbacks, + maintains a local cache, and `GetStoreForwardStatusAsync` + returns from the cache. This requires WCF duplex + (`CallbackContract`) which is not currently exercised + anywhere in `src/AVEVA.Historian.Client/Wcf/`. +2. **Pull-mode (probe).** SDK calls `GetHistorianInfo` (`GETHI`) + or a discovered `Storage`-service equivalent and maps the + one-shot response. No subscription state required. + +Pull-mode is strongly preferred: it matches the SDK's existing +WCF style, avoids duplex contracts, and the existing code path +in `HistorianWcfStatusClient.GetSystemParameter` is the right +shape. Only fall back to push-mode if Workstream C.3 proves the +server has no pull endpoint that returns SF state. + +### Step 4 — Implement the managed contract method + +Once Step 3 picks pull-mode, implement against the WCF contract +(likely a new `[OperationContract]` on `IStatusServiceContract2` +or a method on `IStorageServiceContract`). Follow the existing +parameter-naming discipline from the resolved +`ValidateClientCredential` blocker: +**use `[MessageParameter(Name = "...")]` to match exact server +element names — do not let WCF derive them from C# parameter +names.** See `handoff.md` "Active Blocker" entry for the +2026-05-04 fix. + +### Step 5 — Add golden-byte fixtures + +Add a request and response fixture under +`fixtures/protocol/store-forward-status/`: + +- `request-get-storage-status.bin` — bytes the SDK sends. +- `response-get-storage-status-running-normal.bin` — server + not in SF. +- `response-get-storage-status-active-sf.bin` — server actively + storing. +- `response-get-storage-status-error.bin` — server's SF errored. + +Capture sources: the same instrumented native wrapper runs that +populate Workstream D. Sanitize hostnames, GUIDs, and timestamps +before committing. + +### Step 6 — Replace the synthesized stub + +Replace `SynthesizeStoreForwardStatus` (lines 107-117 of +`HistorianWcfStatusClient.cs`) with a real implementation. Keep +the synthesized fallback for the case where the storage service +returns a "no SF configured" sentinel — that is *not* an error +condition, it is the normal state for client-only deployments. + +Add a unit test class `WcfStoreForwardStatusProtocolTests` next +to the existing `WcfDataQueryProtocolTests` etc., with golden-byte +parse tests using the fixtures from Step 5. + +Update the operation status table in `README.md:20` from +"synthesized defaults (no SF sidecar to probe)" to +"live-verified" once the integration test passes. + +## 5. Risks and Gotchas + +1. **SF may not be present on the test host.** The dev Historian + probably has SF disabled by default; turning it on means + stopping Runtime DB SQL services, which is invasive. Plan to do + capture work on a dedicated sacrificial Historian VM, not the + shared dev box. +2. **SF sidecar may require Admin or LocalSystem to query.** Any + pipe-direct fallback (Workstream E) will fail under standard + user accounts. Document the privilege requirement explicitly + in the SDK XML doc comments on `GetStoreForwardStatusAsync`. +3. **State is volatile.** Probes that take >100 ms can race + against the server's own SF state machine. Capture *both* + request and response in the same instrumented run; do not + try to correlate two captures. +4. **Push-mode would force a duplex WCF contract.** None of the + existing decoded operations use duplex. Adding it widens the + managed WCF surface significantly and risks .NET-WCF + compatibility issues we have not yet hit. Pull-mode first. +5. **The wrapper's `IsBothConnectionRequested` (token `0x06006193`) + path indicates a "primary + partner" topology.** Out of scope + for this pass per §1, but if the server returns partner data + in the same response we must skip-decode (not throw on) + unknown trailing bytes. +6. **`Open2`-only sessions never receive SF events.** `handoff.md` + "Active Blocker" notes the wrapper's full chain + (`OpenConnection3` after the `ValCl` rounds) is the path that + produces a session the server treats as a real client. SF + probes must run from inside that chain — re-using + `HistorianWcfAuthChainHelper.OpenAuthenticatedConnection`, + the same call site already used by `GetSystemParameter` at + `HistorianWcfStatusClient.cs:42`. +7. **`HISTORIAN_STORAGE_STATUS` field order is not contractual.** + The struct is C++ inside the closed source. If AVEVA reorders + fields between Historian versions, our decoder breaks. Pin the + decoder to the Historian server version observed at session + open (already exposed via `IRetrievalServiceContractN`) and + reject mismatched versions explicitly with + `ProtocolEvidenceMissingException`. Do not silently best-effort + parse. +8. **Sanitization.** Pipe names, registry paths, and SF cache + directory paths can leak hostnames and account names. Run the + `rg` sanitizer (handoff.md "Next Pickup Steps") after every + doc edit. + +## 6. Success Criteria + +A real implementation is "done" when all of the following hold: + +1. `client.GetStoreForwardStatusAsync()` returns + `Pending = true` and `Storing = true` while the local + Historian's SF cache is actively buffering writes (verifiable + by stopping the Runtime DB and writing a value). +2. Returns `Pending = false` and `Storing = false` within + ≤ 5 seconds after the Runtime DB recovers and SF drains. +3. Returns `ErrorOccurred = true` and a non-null, actionable + `Error` message when the SF cache itself fails (disk full, + pipe closed, etc.). +4. Returns the synthesized "no SF" shape (all-false) without + throwing on a Historian where SF is not configured. +5. Two new golden-byte unit tests pass (active-SF and idle-SF + responses). +6. `ProtocolGuardrailTests` no longer needs to exempt + `GetStoreForwardStatusAsync` from any "must throw + `ProtocolEvidenceMissingException`" rule — the method is now + evidence-backed. +7. Live integration test + `HistorianClientIntegrationTests.GetStoreForwardStatusAsync_ReturnsServerState` + (to be added) passes when `HISTORIAN_HOST` is set, skips + cleanly otherwise. +8. `README.md:20` operation status table is updated from + "synthesized defaults" to "live-verified". + +## 7. Open Questions for the Implementer + +Resolve these before writing production code: + +1. Does the server expose a *pull* endpoint that returns the full + `HISTORIAN_STORAGE_STATUS` snapshot, or only push events? + (Workstream C.3 answers this.) +2. What is the binary layout of `HISTORIAN_STORAGE_STATUS`? + (Workstream A.2 + D.1.) +3. What is the `[OperationContract]` shape on + `IStatusServiceContract2.SetStoreForwardEvent`? Specifically: + parameter count, byte-buffer parameters, and exact + `MessageParameter` names? (Workstream A.4.) +4. Is the `Storage` service slot at + `net.pipe:///Storage` and `net.tcp://:32568/Storage` + reachable on a non-Historian-server install? Or does it 404 + when only the client redistributable is present? (Workstream + B + C.1.) +5. Does the SF status snapshot include partner / redundant SF + state inline, or is it returned from a separate call? + (Workstream A.1, look for branches under + `IsBothConnectionRequested`.) +6. Does the SF status read require `OpenConnection3` to have + succeeded, or is `Open2` enough? (Trial: try the discovered + pull endpoint after `Open2` only, before doing + `OpenConnection3`. If it works, the implementation is much + simpler.) +7. What happens when SF is *disabled* by configuration vs + *enabled but idle*? Both should map to `Pending=false, + Storing=false`, but the underlying server response may be a + sentinel error vs an all-zeros struct. The implementation must + distinguish "no SF" (return defaults silently) from "SF errored" + (return `ErrorOccurred = true`). + +## 8. Out of Scope + +Explicitly not part of this plan: + +- SF write-back (the project mission is read-only; + `IStorageServiceContract.AddStreamValues` etc. stay + unimplemented). +- Setting SF parameters + (`IStorageServiceContract.SetStoreForwardParameter`). +- Redundant-partner SF aggregation + (`HistorianStoreForwardStatus.AddPartnerStoreForwardStatus`). +- Reverse-engineering the on-disk SF cache file format beyond + presence / file count (Workstream E is a fallback, not a + primary deliverable). +- Anything in the + `aahClientCommon.CSFConnection.StartStoreforward` / + `SetStorageStopped` / `SetTagSynchronized` write surface. diff --git a/docs/plans/write-commands-reverse-engineering.md b/docs/plans/write-commands-reverse-engineering.md new file mode 100644 index 0000000..c4bc4b4 --- /dev/null +++ b/docs/plans/write-commands-reverse-engineering.md @@ -0,0 +1,425 @@ +# Plan: Reverse-Engineering Write Commands + +Status: PLAN ONLY (no implementation yet). Extends the read/event +work in `docs/reverse-engineering/handoff.md` (2026-05-04). + +## 1. Goal + +"Write commands work" means the production SDK at +`src/AVEVA.Historian.Client/` performs these operations end-to-end +against a live AVEVA Historian, with parsed responses, golden-byte +unit tests, and gated live integration tests. + +In scope: + +1. **`AddS2` (`IHistoryServiceContract2.AddStreamValues2`)** — push + one or more timestamped samples for an existing historized tag. + Primary use case: an OPC UA driver pushing values to the + Historian. +2. **`EnsT2` (`IHistoryServiceContract2.EnsureTags2`) for + analog/discrete/string data tags** — partially decoded for the + `CM_EVENT` AnE-event tag in + `src/AVEVA.Historian.Client/Wcf/HistorianAddTagsProtocol.cs`. The + `CTagMetadata` byte layout for `CDataType` ∈ {1, 2, 3, 4} is the + new evidence target. +3. **`DelT` (`IHistoryServiceContract2.DeleteTags`)** — needed for + safe sandbox cleanup during RE. +4. **`ModifyData` / `DeleteData`** — only if §3.4 method discovery + confirms a managed WCF op exists. + +Out of scope: tag-extended-properties (`AddTEx` / `DelTep`), +`ExKey`, `SetSFP`, snapshot send (`SendSnapshotBegin/End/Snapshot`), +tag-id-pair maintenance, shard splits, flush ops, all +`IStorageServiceContract` writes (engine-internal — see §6.d), event +writes (events come from AVEVA AnE, we only read them), schema +changes (forbidden over the wire). + +## 2. Safety Constraints + +The Runtime DB is production data even on `localhost`. `AddS2` +writes are persistent — they go to compressed history blocks and +cannot be removed through any client-facing surface. + +Hard rules: + +1. **Single dedicated sandbox tag.** Add env var + `HISTORIAN_WRITE_SANDBOX_TAG = "RetestSdkWriteSandbox"`. Live + write tests refuse to run when unset, even when other + `HISTORIAN_*` vars are set. +2. **Never write to** any tag named in `HISTORIAN_TEST_TAG`, + `HISTORIAN_TAG_FILTER`, the docs, the test fixtures, or the + captured RE ndjson. The read fixture + `OtOpcUaParityTest_001.Counter` is OFF-LIMITS for writes. +3. **Documented rollback.** Every write session records its time + window to + `artifacts/reverse-engineering/write-sandbox-window-.json` + so SQL `SELECT * FROM History WHERE wwTagKey = ? AND DateTime + BETWEEN @s AND @e` can identify exactly which rows the session + inserted. Tag rollback is via decoded `DelT` (§3.3) once + available, or manually via System Management Console until then. +4. **Time bounds on writes.** Every `AddS2` test uses + `DateTime.UtcNow` ± a small offset, so writes always land inside + the live `RealTimeWindow` / `FutureTimeThreshold` system + parameters and cannot accidentally overwrite older blocks. +5. **No customer / corporate hosts.** `localhost` only. +6. **Sanitization scan after every session:** + `rg -n "(?i)(password|credential|secret|token|||)" docs\reverse-engineering scripts tools docs\plans`. + +Soft rules: + +- Use a separate captures dir + (`artifacts/reverse-engineering/instrumented-wcf-writemessage-writes/`) + so write captures don't contaminate the existing read/event + ndjson. +- New integration tests follow the existing gating pattern in + `tests/AVEVA.Historian.Client.Tests/HistorianClientIntegrationTests.cs` + (`Skip = ...` when env var unset). + +## 3. Discovery Workstreams + +### 3.1 EnsT2 for analog/discrete/string tags (priority 1) + +- WCF op: `aa/Hist/EnsT2`. +- Contract: + `src/AVEVA.Historian.Client/Wcf/Contracts/IHistoryServiceContract2.cs:82-89`, + already declared with `[MessageParameter(Name = "InBuff" / "OutBuff")]`. +- Existing code: `HistorianAddTagsProtocol.SerializeCmEventCTagMetadata` + builds the `CDataType=5` (event) shape. +- Missing: the `CTagMetadata` byte layout for `CDataType ∈ {1, 2, + 3, 4}` (analog double, discrete, string, analog int per the + type-code table in `data-query-request-ctor-il-latest.txt`); + whether the optional-mask `0x0086` and the 5-byte trailer + `2F 27 01 01 01` change per type; analog engineering-units / range + / deadband fields (likely populate the bytes that are zero in the + event-tag fixture). + +### 3.2 AddS2 stream values (priority 1) + +- WCF op: `aa/Hist/AddS2`. +- Contract: + `src/AVEVA.Historian.Client/Wcf/Contracts/IHistoryServiceContract2.cs:75-80`, + already has `[MessageParameter(Name = "pBuf")]`. **Audit + requirement:** verify against `ildasm aahClientAccessPoint.exe` + that `Handle` and `errorBuffer` parameter names also match — the + handoff's parameter-name-mismatch class has bitten ~30 ops. +- Missing: entire `pBuf` byte layout (likely `UInt16 version + UInt32 + sampleCount + N × {tagId GUID, FILETIME, qualityByte, value typed + by CDataType}`); whether `Handle` is the same Open2 v6 session GUID + as `UpdC3`/`RTag2`/`EnsT2`; the auth-chain prereqs (event flow + needed Stat priming + Trx/Stat/Retr `GetV` between RTag2 and EnsT2; + writes may have a different chain); success vs error response + shape. + +### 3.3 DelT tag deletion (priority 2 — needed for safe RE) + +- WCF op: `aa/Hist/DelT`. +- Contract: + `src/AVEVA.Historian.Client/Wcf/Contracts/IHistoryServiceContract2.cs:21-30`. +- Missing: `tagNames` byte layout (likely length-prefixed + compact-ASCII per the handoff convention); whether server refuses + to delete tags with stored history or cascades; whether `DelT` is + sufficient to fully unregister or leaves orphan rows in + `Runtime.dbo.Tag`. + +### 3.4 ModifyData / DeleteData (priority 3 — exists?) + +No corresponding WCF op is currently declared. **First step:** static +inspection to confirm any managed wrapper exists. + +```powershell +dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll EditValue +dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll ModifyValue +dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll EditData +dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll DeleteData +``` + +If no managed wrapper exists, this op is REST-only / SMC-only — +mark as **out of scope** in this doc. Otherwise decode like +§3.1/§3.2. + +Parallelism: 3.1 and 3.3 can be developed in parallel because the +operator can create the sandbox tag manually via SMC while SDK code +is being written. 3.2 cannot meaningfully proceed until 3.1 (or the +manual tag) exists. 3.4 method discovery is cheap and may eliminate +its own scope. + +## 4. RE Steps in Execution Order + +For each workstream above, run these five steps. Mirrors the read ++ event flows that recovered the existing protocol. + +### 4.a Static method discovery + +Find the native serializer: + +```powershell +dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll AddS +dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll EnsureTag +dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- methods current\aahClientManaged.dll DeleteTag +``` + +Dump IL for each method of interest: + +```powershell +dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- dnlib-method --instructions current\aahClientManaged.dll +``` + +Save sanitized excerpts to +`docs/reverse-engineering/dnlib--il-latest.txt`. + +### 4.b Wire-byte capture for the request + +Same IL-rewrite tooling that captured the 27 outgoing event calls: + +```powershell +$captureDir = "artifacts\reverse-engineering\instrumented-wcf-writemessage-writes" +New-Item -ItemType Directory -Force -Path $captureDir | Out-Null +dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- instrument-wcf-writemessage current\aahClientManaged.dll "$captureDir\aahClientManaged.dll" +Copy-Item -Force "$captureDir\aahClientManaged.dll" "$captureDir\current-copy\aahClientManaged.dll" +$env:AVEVA_HISTORIAN_RE_CAPTURE = (Resolve-Path $captureDir).Path + "\writemessage-capture-write-latest.ndjson" +``` + +A new harness scenario `--scenario write` needs to be added to +`tools/AVEVA.Historian.NativeTraceHarness` to drive the native +wrapper's `AddStreamValues2` against the sandbox tag. Suggested +new args: `--write-sandbox-tag`, `--write-value`. + +### 4.c Wire-byte capture for the response + +Symmetric `instrument-wcf-readmessage`: + +```powershell +dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- instrument-wcf-readmessage current\aahClientManaged.dll "$captureDir\aahClientManaged.dll" +``` + +The success response for `AddS2` is just `true` + +empty `errorBuffer`. **Capture at least one negative case** (write +to non-existent tag, or write with malformed CDataType) so the +orchestrator can surface diagnostics like +`HistorianWcfEventOrchestrator.LastErrorBufferDescription`. + +### 4.d Decode against IL + +Strip SOAP/MDAS envelope; align byte offsets against the native +serializer IL from 4.a (the `ldc.i4 / call WriteByte` sequence +makes field order and constants explicit); cross-reference the +`CDataType` table from `data-query-request-ctor-il-latest.txt` to +interpret typed value bytes; write a parser-and-builder pair and +verify against the captured bytes before committing. + +### 4.e Implement managed serializer + tests + +New code under `src/AVEVA.Historian.Client/Wcf/`: + +- `HistorianAddStreamValuesProtocol.cs` — `Serialize(...)` returns + `byte[] pBuf`, mirroring `HistorianAddTagsProtocol`. +- Extend (or split) `HistorianAddTagsProtocol` for the analog / + discrete / string `EnsT2` shapes. +- `HistorianWcfWriteOrchestrator.cs` — chains `Hist.GetV → + Hist.ValCl × 2 → Hist.Open2 → UpdC3 → priming chain (TBD per + §3.2) → AddS2 loop → Close2`. + +Public surface on `HistorianClient`: + +- `WriteValueAsync(tag, value, timestampUtc, quality)` +- `WriteValuesAsync(IReadOnlyList)` +- `EnsureTagAsync(HistorianTagDefinition)` +- `DeleteTagAsync(string tagName)` + +Until evidence supports each path, throw +`ProtocolEvidenceMissingException` (mirrors the existing read +guardrail). + +Unit tests under `tests/AVEVA.Historian.Client.Tests/Wcf/`: + +- `WcfAddStreamValuesProtocolTests` — golden-byte tests for one + analog, one discrete, one string write. +- `WcfEnsureTagsProtocolTests` — golden-byte tests for the + analog/discrete/string `CTagMetadata` shapes. +- Extend `ProtocolGuardrailTests` so any not-yet-implemented write + path still throws `ProtocolEvidenceMissingException`. + +Live integration tests in `HistorianClientIntegrationTests.cs`, +gated on `HISTORIAN_WRITE_SANDBOX_TAG`: +`WriteValueAsync_WithinDocumentedWindow_PersistsToHistorianDb` +writes a unique value, reads it back via `ReadRawAsync`, and +verifies via direct `sqlcmd` to the History extension table. + +## 5. Order of Operations + +``` +3.4 method discovery (cheap; may eliminate scope) + │ + ▼ +3.1 EnsT2 (analog/discrete/string) ──► sandbox tag exists + │ + ├─────────────────────────────┐ + ▼ ▼ +3.2 AddS2 (priority 1) 3.3 DelT (sandbox cleanup) + │ + ▼ +3.4 ModifyData/DeleteData (only if 3.4 confirmed scope) + │ + ▼ +public surface, golden-byte tests, integration tests +``` + +3.2 is the headline win and depends only on 3.1 (or a manually +created sandbox tag). 3.3 must land before any commit that +programmatically creates new tags; until then, manual SMC deletion +is the documented rollback. + +## 6. Risks and Mitigations + +### 6.a Auth chain may differ for writes + +Reads use `Hist.Open2(ConnectionMode = 0x402)`. Events use the same +`0x402` plus a Stat-priming chain. Writes may need a different +mode (the handoff notes `0x501` was an unverified guess for +events; writes may legitimately need `0x401` or another value). + +Mitigation: capture the *full* WriteMessage sequence for a native +write session (not just `AddS2`) to see what `Open2` payload and +priming calls the native wrapper sends. + +### 6.b Server-side session-table requirement + +Writes may require `RTag2` after `EnsT2` and before `AddS2` (the +event flow needs `RTag2(CmEventTagId)`). The "tag identifier" the +server returns from `EnsT2` may differ from the GUID the client +seeded. + +Mitigation: capture the analog `EnsT2` `OutBuff` (event flow's was +a 45-byte echo) and verify whether subsequent `AddS2` payloads +reference the client-seeded GUID, the server-returned GUID, or a +numeric `wwTagKey`. SQL ground truth: `SELECT TagName, wwTagKey +FROM Tag WHERE TagName = '...'`. + +### 6.c Silent-success failure mode + +`AddS2` may return `true` but no row appears in the History +extension table — the engine silently drops samples outside the +`FutureTimeThreshold` / `RealTimeWindow` system parameters (which +the event flow now reads). + +Mitigation: always write at `DateTime.UtcNow`; cross-check with +SQL after every test: + +```sql +SELECT TOP 5 DateTime, Value, QualityDetail +FROM History +WHERE wwTagKey = (SELECT wwTagKey FROM Tag WHERE TagName = @sandbox) + AND DateTime BETWEEN @windowStart AND @windowEnd +ORDER BY DateTime DESC; +``` + +Surface `FutureTimeThreshold` / `RealTimeWindow` via existing +`GetSystemParameterAsync` so failures are diagnosable. + +### 6.d Storage service vs History service + +`IStorageServiceContract` also exposes `AddT/AddS/AddS2/DelT`. The +working hypothesis is that `/Hist` is client-facing and `/Stor` is +engine-internal, but it's not yet verified. + +Mitigation: the WriteMessage capture (§4.b) shows the actual +service path on the wire. If it goes to `/Stor`, update the +orchestrator. Do NOT preemptively implement against both. + +### 6.e Parameter-name mismatches + +Handoff already flagged `EnsT`, `EnsT2`, `RTag2`, `ExKey`, `StJb`, +`GtJb` for the same `inBuff`/`inputBuffer` mismatch class that +broke reads for weeks. Until each is audited against the server +contract, requests bind to null and the server NREs. + +Mitigation: before the first write WriteMessage capture, run an +`ildasm` audit against `aahClientAccessPoint.exe` for the exact +parameter names of `EnsT2`, `AddS2`, and `DelT`, and reconcile +against the existing `[MessageParameter]` attributes. + +### 6.f Customer-data exposure in capture files + +Write captures contain the sandbox tag name and any value the test +wrote. Not secrets, but noise. + +Mitigation: keep all +`instrumented-wcf-writemessage-writes/` artifacts under +`artifacts/` (already gitignored). Sanitize tag names to +`` before committing decoded bytes into +`docs/reverse-engineering/`. + +## 7. Success Criteria + +Per op: + +- **`EnsT2(analog)`**: `EnsureTagAsync(new HistorianTagDefinition { + Name = sandbox, DataType = Analog })` returns success; + `sqlcmd -E -S . -d Runtime -Q "SELECT TagName FROM Tag WHERE + TagName = '...'"` returns one row. +- **`EnsT2(discrete, string)`**: same shape with corresponding + `DataType`; SQL check uses `DiscreteTag` / `StringTag` view. +- **`AddS2`**: `WriteValueAsync(sandbox, 42.0, DateTime.UtcNow)` + returns success; `ReadRawAsync` returns the value; + `SELECT TOP 1 Value FROM History WHERE wwTagKey = ? AND DateTime + BETWEEN ? AND ?` returns the same value. +- **`DelT`**: `DeleteTagAsync(sandbox)` returns success and SQL + returns zero rows from `Tag`. +- **`ModifyData` / `DeleteData`**: deferred until §3.4 method + discovery confirms scope. + +Cross-cutting: + +- All new code in `src/AVEVA.Historian.Client/` is pure managed + .NET 10. No new P/Invoke beyond the existing `HistorianSspiClient`. +- Every new op has a golden-byte unit test. +- `dotnet test .\Histsdk.slnx --no-build --logger + "console;verbosity=minimal"` passes 100%. +- With `HISTORIAN_HOST=localhost`, + `HISTORIAN_WRITE_SANDBOX_TAG=RetestSdkWriteSandbox` set, write + integration tests pass and leave zero residue (test `Dispose` + calls `DelT` for cleanup). +- Sanitization scan returns no real secrets. +- `CLAUDE.md` "Required SDK Surface" updated to add the new write + ops — this is a SCOPE CHANGE that must land *alongside* the + evidence, not before. Do not update the SDK surface doc until + 3.1 + 3.2 are at least live-test-green. + +## 8. Open Questions + +1. Does `AddS2` go through `/Hist` or `/Stor` on the wire? +2. Does the sandbox tag need pre-configuration via System + Management Console once before `EnsT2` will accept it from a + client (e.g. for `Storage` / `wwDomain` rows the wire protocol + may not be able to populate)? +3. What `ConnectionMode` does the native wrapper use for write + sessions — `0x402` (read mode reused), `0x401`, or something + else? +4. Does `EnsT2(analog)` require any optional Archestra + engineering-units fields, or are they purely cosmetic? Affects + how minimal `HistorianTagDefinition` can be. +5. Server-side throttles on writes (max samples per AddS2, max + calls per second) — need to surface as batching guidance? +6. What does the server return when `AddS2` is called with a + timestamp older than the tag's earliest stored block? Some + historians silently drop, some error, some accept-and-overwrite. +7. Does the SDK expose write quality as the same + `HistorianSample.Quality` enum used on reads, or a smaller + subset (good/bad)? +8. Is there a managed-side `DelT` path at all? If + `aahClientManaged` only exposes deletion via SMC, §3.3 is + "manual SMC only" and must be documented as such. + +## 9. Docs To Update Once Each Workstream Lands + +- `CLAUDE.md` "Required SDK Surface" — add `WriteValueAsync`, + `EnsureTagAsync`, `DeleteTagAsync` once 3.1+3.2+3.3 land. +- `AGENTS.md` "Required SDK Surface" — same; update the "alarm-event + write path is dormant" note. +- `docs/reverse-engineering/handoff.md` — add a "Write-flow prereqs" + section symmetric to the existing "Event-flow prereqs". +- `docs/reverse-engineering/wcf-contract-evidence.md` — add evidence + rows for `EnsT2(analog/discrete/string)`, `AddS2`, `DelT`. +- `docs/reverse-engineering/implementation-status.md` — flip + status from "out of scope" to "implemented". +- `README.md` — operation status table.