Initial project state: .NET reference, design, Rust port (M0+M1), evidence
rust / build / test / clippy / fmt (push) Has been cancelled
rust / build / test / clippy / fmt (push) Has been cancelled
Layout:
- src/ .NET 10 x64 reference: MxNativeCodec, MxNativeClient,
MxAsbClient, probes, tests, harnesses. Executable spec.
- design/ Architectural plan for the Rust port (M0–M6), error
model, protocol invariants, risks (R1–R16), adversarial
review log (review.md).
- rust/ Rust workspace. M0 skeleton + M1 codec parity.
mxaccess-codec: 215 unit tests + 2 cross-implementation
parity tests (byte-identical against .NET reference).
Other crates are M0 stubs awaiting M2+.
- captures/ Frida + netsh + pcap evidence per CLAUDE.md
("captures are evidence, not throwaway logs").
- analysis/ Decompiled C# (frida/proxy/decompiled-*),
Ghidra exports for native DLLs (`exports/` only —
working state at `projects/` and AVEVA's input
binaries at `input/` are gitignored).
- docs/ Reverse-engineering reference docs.
- tools/ Setup-LiveProbeEnv.ps1 (Infisical credential fetcher),
Compute-Crc.ps1 (.NET parity helper).
- .github/workflows/ Rust CI: fmt + build + test + clippy on Windows.
- LICENSE MIT (Joseph Doherty, 2026).
Verified:
- cargo test --workspace → 217 passed (215 unit + 2 .NET parity), 0 failed
- cargo clippy --workspace -- -D warnings → clean
- cargo fmt --all -- --check → clean
- cargo publish --dry-run -p mxaccess-codec → packages cleanly
Excluded from history (see .gitignore):
- **/bin, **/obj, **/target — build artifacts
- analysis/ghidra/projects/ — Ghidra working state (regenerable)
- analysis/ghidra/input/ — AVEVA proprietary DLLs (vendor IP)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,127 @@
|
||||
# Overview
|
||||
|
||||
## Mission
|
||||
|
||||
Build a **native Rust replacement for AVEVA/Wonderware MXAccess** that gives Rust applications byte-equivalent access to the AVEVA System Platform without depending on the 32-bit `LmxProxy.dll` / `NmxSvcps.dll` interop chain.
|
||||
|
||||
The replacement ships in two layers:
|
||||
|
||||
1. **Raw layer** — a faithful Rust reimplementation of the wire protocol (codec + transport + session). Every byte over the wire matches what native MXAccess sends, validated against Frida-captured baselines. The raw layer's API is `unsafe`-free and Tokio-aware (it uses Tokio for I/O) but its codec is pure and runtime-agnostic.
|
||||
2. **Async layer** — an idiomatic Tokio façade on top of the raw layer: typed errors, `Send + Sync` handles, `async fn` operations, structured subscription `Stream`s, drop and `CancellationToken` cancellation, `tracing` instrumentation. This is what most consumers reach for.
|
||||
|
||||
Both layers ship in one Cargo workspace; the raw crates are useful on their own for power users who need byte-level control or who are integrating into a non-standard runtime.
|
||||
|
||||
## Why two layers
|
||||
|
||||
Inverting the order would compromise correctness. If the public API is async-first, the protocol behavior gets shaped to fit the API. We saw the alternative work in the .NET reference: every async method bottoms out in a sync codec call (`NmxTransferEnvelopeTemplate.Encode` — see `src/MxNativeCodec/NmxTransferEnvelopeTemplate.cs:33`) because the wire format has no "async" — it has bytes. Putting bytes first lets us validate against captures with a pure round-trip test, then layer ergonomics on top.
|
||||
|
||||
The split also maps cleanly onto the existing .NET tree:
|
||||
|
||||
| .NET project | Rust analogue (raw) | Rust analogue (async) |
|
||||
|---|---|---|
|
||||
| `MxNativeCodec` | `mxaccess-codec` | (codec is shared) |
|
||||
| `MxNativeClient` (DCE/RPC + NTLM + IRemUnknown + INmxService2) | `mxaccess-rpc`, `mxaccess-nmx`, `mxaccess-callback` | (transport is shared) |
|
||||
| `MxNativeClient` (`MxNativeSession`, `MxNativeCompatibilityServer`) | (raw layer ends at transport) | `mxaccess` (async session, optional `mxaccess-compat` shim) |
|
||||
| `MxAsbClient` | `mxaccess-asb` (codec+transport) | `mxaccess` (async ASB session) |
|
||||
|
||||
The session-level state in `MxNativeSession` (subscription registry, correlation-id bookkeeping, recovery state, callback routing — `src/MxNativeClient/MxNativeSession.cs:90-125, 312-351, 573`) lives in the async `mxaccess` crate, **not** in `mxaccess-nmx`. The raw `mxaccess-nmx` crate exposes the `INmxService2` client + envelope codec + a low-level register/advise/write surface so power users *can* drive it directly (per the "byte-level control" promise above) — but it does not own correlation or recovery, because those are session-level concerns that span both transports. A consumer using `mxaccess-nmx` standalone is responsible for its own correlation-id table.
|
||||
|
||||
## Architectural principles
|
||||
|
||||
These are non-negotiable. They are informed by what went wrong in the reverse-engineering effort and what the existing tree gets right.
|
||||
|
||||
1. **Do not fabricate protocol behavior.** Every wire shape in the Rust port must be backed by a Frida capture, a decompiled artifact, or a live probe. When extending, cite the evidence — and capture a new fixture if one does not exist. The native codec deliberately does not zero "unknown" bytes; the Rust port mirrors this.
|
||||
2. **Round-trip preservation.** Encoder and decoder must be bijective on observed traffic. Codec types keep the original byte buffer alongside parsed fields so unknown bytes survive a parse + re-encode. `NmxTransferEnvelopeTemplate` and `ObservedWriteBodyTemplate` in the .NET reference exist for this reason — Rust analogues must too.
|
||||
3. **No `unsafe` in the public API surface.** Public types and trait methods across all crates are safe. Internal `unsafe` is permitted but confined to `mxaccess-rpc`, where COM activation / `IUnknown` calls via the `windows` crate are unavoidable (see principle 6) — every such call must be wrapped in a safe abstraction at the crate boundary. Codec crates (`mxaccess-codec`, `mxaccess-asb-nettcp`) remain `#![forbid(unsafe_code)]`: no raw pointers, no `transmute`, multi-byte field access via `bytes::Buf` / `byteorder`, memory layout never derived from `#[repr(C)]`.
|
||||
4. **x64 only.** The whole point of the replacement is escaping the 32-bit `NmxSvcps.dll` proxy/stub. The Rust workspace targets `x86_64-pc-windows-msvc` (and optionally `x86_64-pc-windows-gnu`). No 32-bit code paths anywhere; cross-compile to `i686-*` is unsupported by design.
|
||||
5. **Windows-first, cross-platform-aware.** NTLM, DPAPI, and Galaxy SQL Server are Windows realities for AVEVA deployments. Crate boundaries are drawn so the codec, ASB net.tcp framing (MC-NMF + MC-NBFX/NBFS — *not* SOAP/XML on the wire; see `src/MxAsbClient/MxAsbDataClient.cs:660-685` where `NetTcpBinding(SecurityMode.None)` selects the default `BinaryMessageEncodingBindingElement`), and protocol logic compile on Linux even when the platform-bound transports do not. Cross-platform reach is a stretch goal — see `70-risks-and-open-questions.md`.
|
||||
6. **COM via `windows-rs`** when COM types are unavoidable: OBJREF building, IPID/OXID/OID handling, GUID literals. For raw bytes (NDR encoding, NMX envelope, write bodies) we hand-roll — the surface is small enough that a generated stub would obscure the wire and compromise rule 1.
|
||||
7. **Galaxy access is direct SQL.** No LMX. The Rust port queries `dbo.gobject` / `dbo.instance` / `dbo.dynamic_attribute` (and the package-inheritance CTE) the same way `GalaxyRepositoryTagResolver.cs` does (see `src/MxNativeClient/GalaxyRepositoryTagResolver.cs:215, 253, 257`), then computes CRC-16/IBM signatures locally to build `MxReferenceHandle`s. **Note**: `CLAUDE.md` lists the SQL surface as `aa_attribute` / `aa_object` / `mx_attribute_category` — that is incorrect. Those tables do not exist in the resolver source; the actual tables are `dbo.gobject` / `dbo.instance` / `dbo.dynamic_attribute` as cited above. Treat this design doc as authoritative over `CLAUDE.md` for SQL surface, and update `CLAUDE.md` next time it is touched.
|
||||
8. **One Tokio runtime, multi-thread by default.** The async layer assumes `#[tokio::main(flavor = "multi_thread")]` semantics; current_thread is supported but not the default. No `tokio::spawn` from inside `Drop`; no blocking calls inside `async fn`. Drop-based cancellation is implemented by sending a cleanup request (e.g. `UnAdvise` for a `Subscription`, `RemoveSubscriberEngine`/`UnregisterEngine` for the last `Session` clone) over a `tokio::sync::mpsc` or `tokio::sync::oneshot` channel to a long-lived connection task that was spawned at session construction time. The connection task's lifetime exceeds any individual `Subscription`, so `Drop` itself never spawns and never blocks. This mirrors the .NET reference's synchronous teardown path (`MxNativeSession.cs:483-507`), where `UnAdvise` per subscription, `RemoveSubscriberEngine` per publisher, and `UnregisterEngine` are all invoked from a single dispose-time loop on a pre-existing service handle.
|
||||
9. **Two transports, one façade.** `Session` is parameterised over a `Transport` trait. `NmxTransport` and `AsbTransport` are independent implementations; capability is queryable. A `Session` constructed with a single transport returns `Error::Unsupported` for operations that transport cannot reach (e.g. `Session::activate(item)` on an ASB-only `Session` — ASB has no `Activate`/`Suspend`/supervisory-advise surface; see non-goal 5). A `Session` constructed via the dual-transport builder (`Session::builder().with_nmx(...).with_asb(...).build()`) routes callback-only operations to NMX automatically and the regular tag data plane to ASB, matching the deployment shape recommended in `docs/ASB-Native-Integration-Decision.md`. Routing is static at session-build time; `Session` does not silently activate a fallback transport at runtime.
|
||||
10. **Status is data, errors are exceptional.** A non-Ok `MxStatus` on a returned data change is data the caller inspects, not a `Result::Err`. A non-Ok status returned from a synchronous-shaped operation (`write`, `read`) **is** an `Err`. This split mirrors the .NET reference and is the only sensible mapping; see `50-error-model.md`.
|
||||
|
||||
## Non-goals (V1)
|
||||
|
||||
- WinSXS-style side-by-side install with the native MXAccess COM proxies.
|
||||
- 32-bit clients. The Rust crates do not build for `i686-pc-windows-msvc`.
|
||||
- A drop-in COM-visible `LMXProxyServer.LMXProxyServer` ProgId. The MXAccess shape is replicated as a Rust API; consumers that want to expose it as COM register a separate shim crate (`mxaccess-compat-com`, deferred to post-V1).
|
||||
- Linux first-class support in V1. Crate boundaries do not preclude Linux later, but Galaxy SQL + DPAPI mean V1 ships Windows-only.
|
||||
- ASB feature parity with NMX. ASB cannot reach callback-only semantics (Activate/Suspend, supervisory advise, OperationComplete). The Rust port routes those to NMX; ASB owns the regular tag data plane only. See `docs/ASB-Native-Integration-Decision.md`.
|
||||
|
||||
## At-a-glance architecture
|
||||
|
||||
```
|
||||
+----------------------------------------------------------------------+
|
||||
| Application (Rust, async) |
|
||||
+----------------------------------------------------------------------+
|
||||
|
|
||||
v async fn / Stream<Item = DataChange>
|
||||
+----------------------------------------------------------------------+
|
||||
| mxaccess (async layer) |
|
||||
| - Session, Subscription, DataChange, MxValue |
|
||||
| - trait Transport { connect, register, write, advise, ... } |
|
||||
| (read is NOT a transport primitive — it is a session-level helper |
|
||||
| composed from subscribe + first-result + drop, mirroring |
|
||||
| MxNativeSession.ReadAsync at src/MxNativeClient/MxNativeSession.cs:312-359) |
|
||||
| - Drop-cancellable, tracing-instrumented, typed Error |
|
||||
+----------------------------------------------------------------------+
|
||||
| |
|
||||
| NmxTransport | AsbTransport
|
||||
v v
|
||||
+---------------------------------+ +----------------------------------+
|
||||
| mxaccess-nmx (NMX raw) | | mxaccess-asb (ASB raw) |
|
||||
| INmxService2 client + envelope | | IASBIDataV2 client + variant |
|
||||
+---------------------------------+ +----------------------------------+
|
||||
| | |
|
||||
v v v
|
||||
+--------------+ +------------------------+ +--------------------+
|
||||
| mxaccess-rpc | | mxaccess-callback | | mxaccess-asb-nettcp|
|
||||
| DCE/RPC PDU | | INmxSvcCallback server | | MC-NMF framing + |
|
||||
| + NTLMv2 SSP | | + IRemUnknown | | MC-NBFX/NBFS binary|
|
||||
| | | | | + DH/HMAC/AES |
|
||||
+--------------+ +------------------------+ +--------------------+
|
||||
|
|
||||
v
|
||||
+----------------------------------------------------------------------+
|
||||
| mxaccess-codec (pure, no I/O) |
|
||||
| MxReferenceHandle, NmxTransferEnvelope, write/advise/subscribe |
|
||||
| bodies, MxStatus, MxValueKind, MxDataType, ASB Variant |
|
||||
+----------------------------------------------------------------------+
|
||||
| |
|
||||
v v
|
||||
+--------------------+ +-------------------+
|
||||
| mxaccess-galaxy | | windows (crate) |
|
||||
| SQL tag resolver | | OBJREF/IID/OXID |
|
||||
+--------------------+ +-------------------+
|
||||
```
|
||||
|
||||
## Phasing summary
|
||||
|
||||
Detailed roadmap in `60-roadmap.md`. At a glance:
|
||||
|
||||
- **M0** — Workspace skeleton, CI, fixture infrastructure.
|
||||
- **M1** — `mxaccess-codec` complete; round-trips every Frida fixture.
|
||||
- **M2** — `mxaccess-rpc` + `mxaccess-callback`: live `RegisterEngine2` against `NmxSvc.exe`.
|
||||
- **M3** — `mxaccess-nmx` + `mxaccess-galaxy`: live scalar write/subscribe.
|
||||
- **M4** — `mxaccess` async façade over NMX. End-to-end consumer-grade API.
|
||||
- **M5** — `mxaccess-asb` + `mxaccess-asb-nettcp`: ASB transport plugged into the same `Session`.
|
||||
- **M6** — `mxaccess-compat` + production hardening (recovery, perf, observability).
|
||||
|
||||
The order is chosen so each milestone's exit criterion is independently observable: codec parity (M1), live RPC (M2), live data (M3), consumer API (M4), alternate transport (M5), shipping (M6).
|
||||
|
||||
## Adjacent tooling (`C:\Users\dohertj2\Desktop\wwtools`)
|
||||
|
||||
A sibling toolkit at `C:\Users\dohertj2\Desktop\wwtools` collects WW/AVEVA-adjacent CLIs and reference material. Several are load-bearing for this project — they replace credentials we would otherwise inline, and provide the comparison harnesses M2–M5 need. See `wwtools/CLAUDE.md` for the authoritative index.
|
||||
|
||||
| Tool | Path | Used by Rust port for |
|
||||
|---|---|---|
|
||||
| `secrets/` | `wwtools/secrets/` | **Credential retrieval.** Self-hosted Infisical CLI (`infisical.exe`) + `Get-Secret.ps1` PowerShell helper backed by `https://infisical.dohertylan.com`. Replaces the DPAPI-only path in `mxaccess-asb` (R9): live probes and CI fetch the ASB shared secret, NTLM credentials, Galaxy DB connection string, etc. via `secret <KEY>` instead of inlining plaintext. The `AsbCredentials::shared_secret(&[u8])` constructor pairs with this — wire it via `secret ASB_SHARED_SECRET` in probe scripts. |
|
||||
| `mxaccesscli/` | `wwtools/mxaccesscli/` | **Parity harness.** `.NET Framework 4.8 / x86` CLI built on `LMXProxyServerClass` — i.e. the original 32-bit MxAccess COM proxy. Use as a third comparison point for cross-implementation parity (alongside `src/MxNativeClient.Probe`). Read/write/subscribe semantics here are the proven ground truth for what consumers expect from the Rust port's compat shim. |
|
||||
| `graccesscli/` | `wwtools/graccesscli/` | **Galaxy configuration setup.** `.NET Framework 4.8 / x86` CLI over GRAccess COM. Use to provision test objects/attributes for live probes (M3+) without manual IDE clicks — scriptable galaxy setup for CI and reproducible test fixtures. |
|
||||
| `grdb/` | `wwtools/grdb/` | **Galaxy SQL schema reference.** Cross-check `mxaccess-galaxy`'s `tiberius` queries against the documented schema, hierarchy queries, and contained-name ↔ tag-name translation rules. M3 schema correctness is verified here before M3 lands. |
|
||||
| `aalogcli/` | `wwtools/aalogcli/` | **Debugging.** Reads System Platform `.aaLGX` binary logs. Use to correlate Rust-port runtime errors with what NmxSvc.exe / LMX adapters log on the System Platform side. |
|
||||
| `histdb/` | `wwtools/histdb/` | **Out of scope for V1** but documented here so the Rust port doesn't accidentally re-implement Historian retrieval in `mxaccess`. The tag data plane (NMX/ASB) and the historical-data plane (`INSQL`, `wwXxx` extensions) are distinct subsystems. |
|
||||
| `aot/` | `wwtools/aot/` | Reference material (ArchestrA Object Toolkit dev guide, API reference). Background only — the Rust port does not consume AOT primitives directly; the wire shapes are observed end-to-end in `captures/`. |
|
||||
|
||||
**Operational note:** `wwtools/secrets/secret <KEY>` is the canonical credential-fetch path on this workstation. The Rust port's `live`-feature integration tests should source `MX_GALAXY_DB`, `MX_NMX_HOST`, `MX_ASB_SHARED_SECRET`, etc. via `secret` invocations in the test setup script, not via inline plaintext or `.env` files committed to the repo. This supersedes the "inline credentials are fine for the maintainer's workstation" stance implied by the M2/M3 live-probe DoDs in `60-roadmap.md`.
|
||||
@@ -0,0 +1,441 @@
|
||||
# Raw layer
|
||||
|
||||
The raw layer is the byte-accurate Rust reimplementation of MXAccess. It lives in three sub-layers:
|
||||
|
||||
```
|
||||
mxaccess-codec pure encoding/decoding, no I/O
|
||||
|
|
||||
v
|
||||
mxaccess-rpc DCE/RPC + NTLMv2 + OBJREF + OXID resolution
|
||||
mxaccess-callback INmxSvcCallback RPC server (callback exporter)
|
||||
|
|
||||
v
|
||||
mxaccess-nmx INmxService2 client + Galaxy SQL resolver
|
||||
mxaccess-asb IASBIDataV2 client (alternate data plane)
|
||||
```
|
||||
|
||||
Codec is pure and runtime-agnostic. Transport crates use Tokio for I/O. Neither layer exposes Tokio in the public types except through `async fn` signatures.
|
||||
|
||||
## `mxaccess-codec`
|
||||
|
||||
Pure protocol codec. No I/O. Compiles on every Rust target including non-Windows. Allocations only where the protocol mandates variable-length fields (string values, array payloads, registration bodies).
|
||||
|
||||
### `MxReferenceHandle` (20 bytes)
|
||||
|
||||
Source: `src/MxNativeCodec/MxReferenceHandle.cs:5–120`.
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
|
||||
pub struct MxReferenceHandle {
|
||||
pub galaxy_id: u8,
|
||||
// Byte 1 is reserved (always 0). Not exposed publicly.
|
||||
pub platform_id: u16,
|
||||
pub engine_id: u16,
|
||||
pub object_id: u16,
|
||||
object_signature: u16, // private — CRC-16/IBM of lowercase UTF-16LE object tag
|
||||
pub primitive_id: i16,
|
||||
pub attribute_id: i16,
|
||||
pub property_id: i16,
|
||||
attribute_signature: u16, // private — CRC-16/IBM of lowercase UTF-16LE attribute name
|
||||
pub attribute_index: i16, // -1 array, 0 scalar
|
||||
}
|
||||
|
||||
impl MxReferenceHandle {
|
||||
pub const ENCODED_LEN: usize = 20;
|
||||
|
||||
/// The only constructor that derives signatures from names. Recomputes both
|
||||
/// `object_signature` and `attribute_signature` so they cannot desync from
|
||||
/// the names that produced them.
|
||||
pub fn from_names(
|
||||
galaxy_id: u8,
|
||||
platform_id: u16,
|
||||
engine_id: u16,
|
||||
object_id: u16,
|
||||
object_tag_name: &str,
|
||||
primitive_id: i16,
|
||||
attribute_id: i16,
|
||||
property_id: i16,
|
||||
attribute_name: &str,
|
||||
is_array: bool,
|
||||
) -> Self;
|
||||
|
||||
/// Parse from a captured 20-byte handle. Signatures come straight from the
|
||||
/// wire; the caller is asserting the bytes are authoritative.
|
||||
pub fn parse(bytes: &[u8; Self::ENCODED_LEN]) -> Self;
|
||||
pub fn encode(self, dst: &mut [u8; Self::ENCODED_LEN]);
|
||||
|
||||
/// Read-only accessors — there is intentionally no `set_*_signature`.
|
||||
pub fn object_signature(self) -> u16 { self.object_signature }
|
||||
pub fn attribute_signature(self) -> u16 { self.attribute_signature }
|
||||
|
||||
/// Returns a new handle with `attribute_name`'s signature recomputed,
|
||||
/// preserving every other field. Use this instead of mutating in place.
|
||||
pub fn with_attribute_name(self, attribute_name: &str) -> Self;
|
||||
pub fn with_object_tag_name(self, object_tag_name: &str) -> Self;
|
||||
|
||||
pub fn compute_name_signature(name: &str) -> u16;
|
||||
}
|
||||
```
|
||||
|
||||
`object_signature` and `attribute_signature` are **derived** values and are the only fields not exposed `pub`. There is no setter that takes a raw `u16` signature without a corresponding name — the only way to update a signature is to hand a name in, which forces a recomputation. This keeps `(object_tag_name, object_signature)` and `(attribute_name, attribute_signature)` consistent by construction. (The .NET reference is more permissive — `MxReferenceHandle` is a record with public init-only signature fields — but mirroring that in Rust would invite caller bugs that the wire format silently rejects with `0x80070057`.)
|
||||
|
||||
`compute_name_signature` mirrors the .NET `ComputeNameSignature` exactly. For each `char` in `name.to_lowercase()`, run the byte sequence in UTF-16LE order through `update_crc16_ibm` (poly `0xa001`, initial `0`, low byte first then high byte).
|
||||
|
||||
⚠ **Unicode lowercasing must match `String.ToLowerInvariant()` semantics, NOT `str::to_lowercase()`.** The .NET `ToLowerInvariant()` uses the culture-invariant Unicode case map (`CultureInfo.InvariantCulture.TextInfo.ToLower`, ICU-derived). Rust `str::to_lowercase()` is **locale-dependent** in spirit (it follows Unicode's special-casing rules, e.g. Turkish dotless-i is mapped per the Default Casing algorithm with no tailoring, which is *close to* but not identical to `Invariant`). Worse, `unicase::Ascii::to_lowercase` is **ASCII-only** and silently passes non-ASCII characters through unchanged — for any tag containing a non-ASCII character it will produce a CRC that disagrees with the .NET reference. The Rust port should:
|
||||
1. Use `icu_casemap::CaseMapper::lowercase_to_string` (ICU4X) configured for `Locale::UND` / Root locale to match `Invariant`, **or** hand-implement Unicode `Default_Lowercase` via the UCD `SpecialCasing.txt` and `UnicodeData.txt` `Lowercase_Mapping` field with no language tailoring.
|
||||
2. **Not** use `str::to_lowercase()` (locale-flavored) and **not** use `unicase::Ascii::to_lowercase` (ASCII-only — destroys non-ASCII parity).
|
||||
3. Treat this as a divergence-test requirement: `mxaccess-codec/tests/` must include fixture tag names containing characters whose ASCII-vs-invariant mapping differs (Turkish dotted-İ → `i̇`, German ß → `ss` is *not* applied by `ToLowerInvariant` so `ß → ß`, Greek Σ → σ — confirm against `MxReferenceHandle.cs:47–59` outputs from the .NET reference and assert byte-equal CRC).
|
||||
|
||||
### `NmxTransferEnvelope` (46 bytes)
|
||||
|
||||
Source: `src/MxNativeCodec/NmxTransferEnvelope.cs:5–104`.
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub struct NmxTransferEnvelope {
|
||||
pub version: u16, // 1
|
||||
pub inner_length: i32, // body.len() - 46
|
||||
pub reserved6_10: [u8; 4], // bytes 6..10; preserved verbatim by Rust port — see note below
|
||||
pub message_kind: NmxTransferMessageKind, // 1=Metadata, 2=ItemControl, 3=Write
|
||||
pub source_galaxy_id: i32,
|
||||
pub source_platform_id: i32,
|
||||
pub local_engine_id: i32,
|
||||
pub target_galaxy_id: i32,
|
||||
pub target_platform_id: i32,
|
||||
pub target_engine_id: i32,
|
||||
pub protocol_marker: i32, // 0x0201
|
||||
pub timeout_ms: i32, // default 30000
|
||||
}
|
||||
```
|
||||
|
||||
`NmxTransferEnvelopeTemplate` is the round-trip preserver: takes a captured 46-byte buffer, exposes setters that patch only `inner_length`, leaves every other byte untouched. Used for `ObservedWriteBodyTemplate`. **This is non-optional** — the protocol has bytes whose semantics are not yet decoded; the .NET reference passes them through.
|
||||
|
||||
⚠ **Reserved bytes 6..10 are NOT preserved by the .NET reference parser.** `NmxTransferEnvelope.Parse` reads only Version, InnerLength, ProtocolMarker, MessageKind, and the engine ids; the four bytes at offset 6 are discarded (`src/MxNativeCodec/NmxTransferEnvelope.cs:39–75`), and `Encode` always writes `0` there (`src/MxNativeCodec/NmxTransferEnvelope.cs:91`). The Rust port intentionally **fixes this gap** by carrying `reserved6_10: [u8; 4]` through `parse`/`encode`, defaulting to `[0; 4]` for newly-constructed envelopes. This honours CLAUDE.md's preserve-unknown-bytes rule on a layer the .NET reference does not — the Rust codec is closer to true byte-parity than the .NET parser when round-tripping captured envelopes that have non-zero bytes at offset 6.
|
||||
|
||||
⚠ The native adapter logs `NMX Header ... buffer size pktHeader.dwDataSize N doesn't match received message size of 46` (`work_remain.md:74–85`) when `inner_length` does not match the actual body size. The encoder validates `inner_length == body.len() - 46` before transmitting.
|
||||
|
||||
### Item-control bodies (advise / supervisory / unadvise)
|
||||
|
||||
| Command | Opcode | Length |
|
||||
|---|---|---|
|
||||
| AdviseSupervisory | `0x1f` | 39 bytes (HeaderLength 3 + GUID 16 + AdviseExtra 2 + Payload 18) |
|
||||
| UnAdvise | `0x21` | 37 bytes (HeaderLength 3 + GUID 16 + Payload 18) |
|
||||
|
||||
The `Advise` enum value shares opcode `0x1f` with `AdviseSupervisory` (`src/MxNativeCodec/NmxItemControlMessage.cs:7–8`), but `NmxItemControlMessage.Parse` only accepts `AdviseSupervisory` or `UnAdvise` (`src/MxNativeCodec/NmxItemControlMessage.cs:46–49`). There is **no** separate 37-byte plain-Advise wire shape — the higher-level `MxNativeCompatibilityServer.AdviseSupervisory` simply forwards to `Advise`, both of which encode an `AdviseSupervisory` 39-byte body (`src/MxNativeClient/MxNativeCompatibilityServer.cs:256–258`).
|
||||
|
||||
Layout: `command(1) + version(2) + correlation_id(GUID 16) + [extra(2) when AdviseSupervisory] + handle_projection(14) + tail(4)` (`src/MxNativeCodec/NmxItemControlMessage.cs:25–35,121–142`).
|
||||
|
||||
Source: `src/MxNativeCodec/NmxItemControlMessage.cs:5–154`.
|
||||
|
||||
### Write bodies (`0x37` / `0x38`)
|
||||
|
||||
Common prefix (18 bytes) ends in a wire-kind byte at offset 17. Layout of the prefix is `cmd(1) + version u16(2) + handle_projection(14, bytes 6..19 of MxReferenceHandle) + wireKind(1)` (`src/MxNativeCodec/NmxWriteMessage.cs:11–13,207–213`). There is no padding between version and the handle projection.
|
||||
|
||||
| WireKind | Type | Total scalar size | Notes |
|
||||
|---|---|---|---|
|
||||
| `0x01` | Boolean | 37 | 4-byte value `[0xff,0xff,0xff,0x00]` (true) or `[0x00,0xff,0xff,0x00]` (false) — bytes 1 and 2 are literal `0xFF` filler, NOT reserved zeros (`src/MxNativeCodec/NmxWriteMessage.cs:257`); 11-byte boolean suffix (7 zero bytes + 4-byte clientToken, `NmxWriteMessage.cs:228–238`) + 4-byte writeIndex |
|
||||
| `0x02` | Int32 | 40 | 4 LE + 14-byte suffix + 4 writeIndex |
|
||||
| `0x03` | Float32 | 40 | 4 IEEE + 14 + 4 |
|
||||
| `0x04` | Float64 | 44 | 8 IEEE + 14 + 4 |
|
||||
| `0x05` | String | 44 + N | recordLength i32(4) + valueByteLength i32(4) + UTF-16LE bytes(N) + null(2) + 14-byte suffix + 4-byte writeIndex; total = `KindOffset(17) + 1 + 4 + 4 + N + 14 + 4` (`src/MxNativeCodec/NmxWriteMessage.cs:148–157`) |
|
||||
| `0x05` | DateTime | 44 + N | Same shape as String. Value is UTF-16LE of `DateTime.ToString("M/d/yyyy h:mm:ss tt", InvariantCulture)` + null (`src/MxNativeCodec/NmxWriteMessage.cs:262,390–393`) |
|
||||
| `0x41–0x45` | Arrays | 18 + 10 + N + 18 | prefix(18) + 4 unused bytes + count u16 at body[22] + elementWidth u16 at body[24] + elements(N) + 14-byte suffix + 4-byte writeIndex (`src/MxNativeCodec/NmxWriteMessage.cs:179–186`) |
|
||||
|
||||
`Write2` (timestamped) replaces the `-1 i16` flag with `0 i16` and inserts an 8-byte `FILETIME` (`DateTime.ToFileTime()`) between offsets 12 and 19 of the suffix.
|
||||
|
||||
`WriteSecured2` (`0x38`) appends `currentUserToken(16) + clientNameLen(i32) + clientNameBytes(UTF-16LE+null) + verifierUserToken(16)` before the trailing index slot.
|
||||
|
||||
Sources: `src/MxNativeCodec/NmxWriteMessage.cs:7–394`, `NmxSecuredWrite2Message.cs:6–105`. Per-type byte matrices in `analysis/frida/write-body-matrix.tsv`, `write-array-body-matrix.tsv`, `write-mode-matrix.tsv`.
|
||||
|
||||
`ObservedWriteBodyTemplate` mirrors the .NET helper: take a captured write body, replace only the value slot, preserve every other byte (suffix, tokens, padding).
|
||||
|
||||
### Subscription Status (`0x32`) and DataUpdate (`0x33`)
|
||||
|
||||
Source: `src/MxNativeCodec/NmxSubscriptionMessage.cs:5–428`.
|
||||
|
||||
**The two frames share a 23-byte common header (`cmd + version + recordCount + operationId`) but diverge immediately after.** The parser must dispatch on `cmd` before reading any further bytes; do not unify the two paths.
|
||||
|
||||
`SubscriptionStatus` (cmd `0x32`) — header → **per-message correlationId** → records (`src/MxNativeCodec/NmxSubscriptionMessage.cs:87–115`):
|
||||
```
|
||||
cmd(1=0x32) + version(2=1) + recordCount(i32) + operationId(GUID 16) [bytes 0..23]
|
||||
correlationId(GUID 16) [bytes 23..39]
|
||||
records[recordCount] [from byte 39]
|
||||
record: status(i32) + detailStatus(i32) + quality(u16)
|
||||
+ timestamp_filetime(i64) + wireKind(u8) + value(N)
|
||||
```
|
||||
|
||||
`DataUpdate` (cmd `0x33`) — header → records, **no correlationId** (`src/MxNativeCodec/NmxSubscriptionMessage.cs:54–55, 65–85`):
|
||||
```
|
||||
cmd(1=0x33) + version(2=1) + recordCount(i32) + operationId(GUID 16) [bytes 0..23]
|
||||
records[recordCount] [from byte 23]
|
||||
record: status(i32) + quality(u16) + timestamp_filetime(i64)
|
||||
+ wireKind(u8) + value(N)
|
||||
```
|
||||
|
||||
⚠ **`recordCount != 1` is a hard error on `0x33` DataUpdate.** The .NET parser throws `ArgumentException` (`src/MxNativeCodec/NmxSubscriptionMessage.cs:71–74`). The Rust port replicates this as a typed error (do not silently accept multi-record DataUpdate frames); buffered batches are listed in `70-risks-and-open-questions.md` (R2/R13) as not yet wire-proven.
|
||||
|
||||
Wire kinds are 0x01..0x07 (scalars) and 0x41..0x46 (arrays).
|
||||
|
||||
ⓘ **Known gap — wire kind `0x47` (`ElapsedTimeArray`) is not enumerated by either the .NET reference or this design.** The scalar `ElapsedTime` (`0x07`, `src/MxNativeCodec/MxValueKind.cs`) has no array counterpart in either `MxValueKind` or `MxValue`, and neither `NmxWriteMessage.cs` (encoder) nor `NmxSubscriptionMessage.cs:270–276` (decoder) has a branch for `0x47`. If a future Frida capture exposes such a frame, both sides need an additive enum variant (`ElapsedTimeArray` → `0x47`) plus a value carrier; the current parser will fall through to the default `(wireKind, null, 0)` opaque arm and the encoder will simply have no way to emit it. Document as a known gap rather than silently extending the enum without evidence.
|
||||
|
||||
⚠ **Encoder/decoder asymmetry on the array kind byte (preserve verbatim):** the write encoder collapses both `StringArray` and `DateTimeArray` to `0x45` and never emits `0x46` (`src/MxNativeCodec/NmxWriteMessage.cs:107`). The subscription/callback decoder treats `0x46` as `DateTimeArray` (`src/MxNativeCodec/NmxSubscriptionMessage.cs:173,275`). Therefore: **writes use `0x41..0x45` only; reads accept `0x41..0x46`.** The Rust port's encoder must match (only emit up to `0x45`); the decoder must accept the full `0x41..0x46` range and demux `StringArray` vs `DateTimeArray` from `wireKind`, not from any encoder-side metadata.
|
||||
|
||||
### Reference registration (`0x10` / `0x11`)
|
||||
|
||||
Source: `src/MxNativeCodec/NmxReferenceRegistrationMessage.cs:6–142`, `NmxReferenceRegistrationResultMessage.cs:6–120`.
|
||||
|
||||
Tagged-string encoding: 4-byte length prefix where `tagged ? (byteLength | 0x81000000) : byteLength`, followed by UTF-16LE bytes plus a null terminator. Codec preserves the 8 zero bytes of `ItemStringReservedLength` (lines 42–45) and the `0x81000000` marker on tagged strings.
|
||||
|
||||
### Type model
|
||||
|
||||
Compatible with the .NET enums in `src/MxNativeCodec/MxStatus.cs`, `MxValueKind.cs`, `MxDataType.cs`.
|
||||
|
||||
```rust
|
||||
#[repr(i16)]
|
||||
pub enum MxStatusCategory {
|
||||
Unknown = -1, Ok = 0, Pending = 1, Warning = 2, CommunicationError = 3,
|
||||
ConfigurationError = 4, OperationalError = 5, SecurityError = 6,
|
||||
SoftwareError = 7, OtherError = 8,
|
||||
}
|
||||
|
||||
#[repr(i16)]
|
||||
pub enum MxStatusSource {
|
||||
Unknown = -1, RequestingLmx = 0, RespondingLmx = 1, RequestingNmx = 2,
|
||||
RespondingNmx = 3, RequestingAutomationObject = 4, RespondingAutomationObject = 5,
|
||||
}
|
||||
|
||||
pub struct MxStatus {
|
||||
pub success: i16,
|
||||
pub category: MxStatusCategory,
|
||||
pub detected_by: MxStatusSource, // .NET field name is `DetectedBy` (`src/MxNativeCodec/MxStatus.cs:31`); not `Source`
|
||||
pub detail: i16, // i16, signed; matches .NET `short Detail` (`src/MxNativeCodec/MxStatus.cs:32`)
|
||||
}
|
||||
|
||||
pub enum MxValueKind {
|
||||
Boolean, Int32, Float32, Float64, String, DateTime, ElapsedTime,
|
||||
BooleanArray, Int32Array, Float32Array, Float64Array,
|
||||
StringArray, DateTimeArray,
|
||||
// No ElapsedTimeArray variant — see footnote.
|
||||
}
|
||||
|
||||
#[repr(i16)]
|
||||
pub enum MxDataType {
|
||||
Unknown = -1, NoData = 0, Boolean = 1, Integer = 2, Float = 3, Double = 4,
|
||||
String = 5, Time = 6, ElapsedTime = 7, ReferenceType = 8, StatusType = 9,
|
||||
Enum = 10, SecurityClassificationEnum = 11, DataQualityType = 12,
|
||||
QualifiedEnum = 13, QualifiedStruct = 14, InternationalizedString = 15,
|
||||
BigString = 16, End = 17,
|
||||
}
|
||||
```
|
||||
|
||||
### `MxValue` — typed value carrier
|
||||
|
||||
```rust
|
||||
pub enum MxValue {
|
||||
Boolean(bool),
|
||||
Int32(i32),
|
||||
Float32(f32),
|
||||
Float64(f64),
|
||||
String(String),
|
||||
DateTime(SystemTime), // FILETIME at codec boundary
|
||||
ElapsedTime(MxElapsedTime), // signed wire — see note below
|
||||
BooleanArray(Vec<bool>),
|
||||
Int32Array(Vec<i32>),
|
||||
Float32Array(Vec<f32>),
|
||||
Float64Array(Vec<f64>),
|
||||
StringArray(Vec<String>),
|
||||
DateTimeArray(Vec<SystemTime>),
|
||||
}
|
||||
```
|
||||
|
||||
Conversions:
|
||||
- `SystemTime` ↔ FILETIME: 100-ns intervals since `1601-01-01T00:00:00Z`. `time::OffsetDateTime` is also acceptable; pick one and stay consistent. The codec layer accepts both via traits.
|
||||
- `ElapsedTime`: **4-byte signed `i32` milliseconds** on the wire. The .NET decoder reads `BinaryPrimitives.ReadInt32LittleEndian` and produces `TimeSpan.FromMilliseconds(milliseconds)` (`src/MxNativeCodec/NmxSubscriptionMessage.cs:252`), which accepts negative values. `std::time::Duration` is unsigned and **must not** be used here — it cannot represent a negative ms value and panics on conversion. The Rust port exposes a newtype `pub struct MxElapsedTime(pub i64);` (milliseconds, signed) — `i64` rather than `i32` so it can also carry the wider `time::Duration`/.NET `TimeSpan` range without precision loss when promoted at the async layer.
|
||||
|
||||
ASB variant lives in a parallel `mxaccess_codec::asb::AsbVariant` because it is wire-incompatible (different type-id space and binary layout).
|
||||
|
||||
### Preservation rules
|
||||
|
||||
Every codec type that decodes a message keeps a private `original: Bytes` field. `re_encode()` returns the original bytes when no fields were mutated. This guarantees byte parity on captured fixtures even when fields' meanings are not fully understood.
|
||||
|
||||
For mutable round-trips (e.g. re-encoding a captured `Write2` with a new value), only mutated fields are re-emitted; the rest of the buffer is patched in place. This matches `ObservedWriteBodyTemplate` in the .NET reference and is essential for the parity test strategy described in `60-roadmap.md`.
|
||||
|
||||
### Test strategy (codec)
|
||||
|
||||
- **Round-trip fixtures**: every captured write/advise/subscribe body in `captures/0NN-frida-*` and every row in `analysis/frida/*-matrix.tsv` is loaded, decoded, re-encoded, and asserted byte-equal.
|
||||
- **Property tests**: `proptest` generators for each primitive (`MxReferenceHandle`, envelope, write body) — encode then decode, assert structural equality.
|
||||
- **CRC vectors**: hardcoded vectors from .NET unit tests are mirrored as Rust constants and asserted.
|
||||
- **Cross-implementation**: a small fixture runner shells out to `dotnet run --project src\MxNativeCodec.Tests` and asserts the same bytes are produced.
|
||||
|
||||
## `mxaccess-rpc`
|
||||
|
||||
DCE/RPC over TCP, NTLMv2 packet integrity, OXID resolution, OBJREF parsing, IRemUnknown::RemQueryInterface. The minimum subset of [MS-RPCE], [MS-DCOM], and [MS-NLMP] required to drive `INmxService2`.
|
||||
|
||||
### NTLM
|
||||
|
||||
Source: `src/MxNativeClient/ManagedNtlmClientContext.cs:1–389`. Implements:
|
||||
|
||||
- **Type1 (Negotiate)** — emit. Negotiate flags as set by `CreateType1` (`src/MxNativeClient/ManagedNtlmClientContext.cs:53–63`): `KeyExchange (0x40000000) | Sign (0x00000010) | AlwaysSign (0x00008000) | Seal (0x00000020) | TargetInfo (0x00800000) | Ntlm (0x00000200) | ExtendedSessionSecurity (0x00080000) | Unicode (0x00000001) | RequestTarget (0x00000004) | Negotiate128 (0x20000000) | Negotiate56 (0x80000000)`. Bit constants per `ManagedNtlmClientContext.cs:10–21`.
|
||||
- **Type2 (Challenge)** — parse. Extract server challenge, AV pairs from TargetInfo (timestamp, channel binding optional).
|
||||
- **Type3 (Authenticate)** — emit. NTLMv2 NT-OWF = HMAC-MD5(MD4(unicode(password)), unicode(uppercase(user) + domain)). Client challenge with AV pairs replayed from the Type2.
|
||||
- **Sign / Verify** — packet-integrity signature: HMAC-MD5(SignKey, sequence || plaintext) → first 8 bytes XOR with RC4 keystream.
|
||||
- **Seal / Unseal** — RC4 stream cipher with derived seal key.
|
||||
- **Sign-key / seal-key** — derived via MD5 on a magic-constant string.
|
||||
|
||||
Rust crates: `hmac`, `md-5`, `rc4` (or hand-rolled), `rand_core`. **Do not pull `ring`** — it does not implement MD4. Hand-roll MD4 (~30 lines, mirroring the .NET reference).
|
||||
|
||||
### DCE/RPC PDU
|
||||
|
||||
Source: `src/MxNativeClient/DceRpcPdu.cs:1–380`, `DceRpcTcpClient.cs:1–420`.
|
||||
|
||||
PDU types implemented: `Bind` (11), `BindAck` (12), `Request` (0), `Response` (2), `Fault` (3), `AlterContext` (14), `AlterContextResponse` (15), `Auth3` (16). Authentication trailer: type=NTLMSSP (10), level=PKT_INTEGRITY (5).
|
||||
|
||||
Fragmentation: max transmit/receive 4280 bytes. Multi-fragment Request/Response bodies concatenate in order using the FIRST (0x01) and LAST (0x02) fragment flag bits.
|
||||
|
||||
### OXID resolution
|
||||
|
||||
`IObjectExporter::ResolveOxid` over port 135. Returns dual-string bindings; we filter for tower id `0x0007` (ncacn_ip_tcp) and parse `host[port]`.
|
||||
|
||||
Source: `src/MxNativeClient/ObjectExporterClient.cs:1–82`.
|
||||
|
||||
### OBJREF / IRemUnknown
|
||||
|
||||
OBJREF is a 68-byte STDOBJREF header + dual-string array. Signature `MEOW` = `0x574F454D`.
|
||||
|
||||
`IRemUnknown::RemQueryInterface` (opnum 3) yields a new IPID for a different IID on the same OXID. Used to obtain `INmxService2` from the activated `IUnknown`.
|
||||
|
||||
Source: `src/MxNativeClient/ComObjRef.cs:1–145`, `RemUnknownMessages.cs:1–79`.
|
||||
|
||||
### Public surface (sketch)
|
||||
|
||||
```rust
|
||||
pub struct DceRpcClient { /* tokio::net::TcpStream + auth + frag state */ }
|
||||
|
||||
impl DceRpcClient {
|
||||
pub async fn connect(addr: SocketAddr) -> Result<Self, RpcError>;
|
||||
pub async fn bind(&mut self, iid: Uuid, ntlm: NtlmContext) -> Result<(), RpcError>;
|
||||
pub async fn alter_context(&mut self, iid: Uuid) -> Result<(), RpcError>;
|
||||
pub async fn call(&mut self, opnum: u16, stub: &[u8]) -> Result<Bytes, RpcError>;
|
||||
}
|
||||
```
|
||||
|
||||
`tokio::net::TcpStream` underneath. The PDU codec is a pure `tokio_util::codec::Decoder` so the same logic could in principle drive a different runtime if the I/O is provided.
|
||||
|
||||
## `mxaccess-callback`
|
||||
|
||||
Server-side. Listens on an ephemeral TCP port; serves Bind / Request PDUs for two interfaces:
|
||||
|
||||
- `IRemUnknown` (RemQueryInterface, RemAddRef, RemRelease)
|
||||
- `INmxSvcCallback` (`DataReceived` opnum 3, `StatusReceived` opnum 4) — names match the .NET reference exactly: `src/MxNativeClient/NmxSvcCallbackMessages.cs:11–12` (`DataReceivedOpnum`/`StatusReceivedOpnum`) and `src/MxNativeClient/NmxProcedureMetadata.cs:89–101` (`NdrProcedureDescriptor` `DataReceived`/`StatusReceived`). The doc previously used a `Raw` suffix that does not appear in the source.
|
||||
|
||||
Source: `src/MxNativeClient/ManagedCallbackExporter.cs:1–335`.
|
||||
|
||||
```rust
|
||||
pub struct CallbackExporter { /* TcpListener + dispatcher + frame channel */ }
|
||||
|
||||
impl CallbackExporter {
|
||||
pub async fn bind(addr: SocketAddr) -> Result<Self, CallbackError>;
|
||||
pub fn obj_ref(&self) -> ObjRef;
|
||||
pub fn frames(&self) -> impl Stream<Item = CallbackFrame>;
|
||||
}
|
||||
|
||||
pub enum CallbackFrame {
|
||||
Data(Bytes), // INmxSvcCallback::DataReceived payload (opnum 3)
|
||||
Status(Bytes), // INmxSvcCallback::StatusReceived payload (opnum 4)
|
||||
}
|
||||
```
|
||||
|
||||
Frames are not decoded here — they're forwarded to `mxaccess-codec::NmxSubscriptionMessage::parse` upstream. This keeps the callback exporter a transport, not a parser.
|
||||
|
||||
## `mxaccess-nmx`
|
||||
|
||||
`INmxService2` client. Sits on top of `mxaccess-rpc` and `mxaccess-callback`.
|
||||
|
||||
```rust
|
||||
pub struct NmxClient { /* DceRpcClient + CallbackExporter handle + state */ }
|
||||
|
||||
impl NmxClient {
|
||||
pub async fn connect(host: &str, ids: EngineIds) -> Result<Self, NmxError>;
|
||||
pub async fn register_engine_2(
|
||||
&mut self, engine_id: i32, name: &str, version: i32, callback: ObjRef,
|
||||
) -> Result<(), NmxError>;
|
||||
pub async fn unregister_engine(&mut self, engine_id: i32) -> Result<(), NmxError>;
|
||||
pub async fn get_partner_version(&mut self, ids: EngineIds) -> Result<i32, NmxError>;
|
||||
pub async fn transfer_data(&mut self, ids: EngineIds, body: &[u8]) -> Result<(), NmxError>;
|
||||
pub async fn add_subscriber_engine(&mut self, ...) -> Result<(), NmxError>;
|
||||
pub async fn remove_subscriber_engine(&mut self, ...) -> Result<(), NmxError>;
|
||||
pub async fn set_heartbeat_send_interval(
|
||||
&mut self, ticks_per_beat: i32, max_missed_ticks: i32,
|
||||
) -> Result<(), NmxError>;
|
||||
}
|
||||
```
|
||||
|
||||
`transfer_data` accepts a pre-encoded body from `mxaccess-codec`. It does not decode; it forwards.
|
||||
|
||||
Includes the SQL tag resolver (`mxaccess-galaxy`):
|
||||
|
||||
```rust
|
||||
pub struct GalaxyResolver { /* tiberius client */ }
|
||||
|
||||
impl GalaxyResolver {
|
||||
pub async fn connect(connection_string: &str) -> Result<Self, GalaxyError>;
|
||||
pub async fn resolve(&mut self, full_reference: &str) -> Result<GalaxyTagMetadata, GalaxyError>;
|
||||
pub async fn resolve_user(&mut self, guid: Uuid) -> Result<GalaxyUser, GalaxyError>;
|
||||
}
|
||||
|
||||
pub struct GalaxyTagMetadata {
|
||||
pub object_tag_name: String,
|
||||
pub attribute_name: String,
|
||||
pub platform_id: u16,
|
||||
pub engine_id: u16,
|
||||
pub object_id: u16,
|
||||
pub mx_data_type: MxDataType,
|
||||
pub security_classification: SecurityClassification,
|
||||
pub is_array: bool,
|
||||
}
|
||||
```
|
||||
|
||||
The resolver does not compute the CRC — consumers do that via `MxReferenceHandle::compute_name_signature` so the codec stays self-contained and the resolver stays a thin SQL layer.
|
||||
|
||||
## `mxaccess-asb`
|
||||
|
||||
`IASBIDataV2` client. SOAP over `net.tcp` framing. Independent of `mxaccess-rpc` and `mxaccess-nmx`; parallel data plane.
|
||||
|
||||
The wire is:
|
||||
- Net.Tcp framing (binary length-prefixed, see [MS-NMF]).
|
||||
- WCF binary message encoding ([MC-NBFX] tokenised XML + [MC-NBFS] static dictionary), **not** SOAP/XML on the wire — the .NET reference uses `new NetTcpBinding(SecurityMode.None)` with no encoder override (`src/MxAsbClient/MxAsbDataClient.cs:660-685`), which selects `BinaryMessageEncodingBindingElement` by default.
|
||||
- Custom binary inside `<ASBIData>` base64 elements (Variant, AsbStatus, MonitoredItem...) — the inner application payload, distinct from the message-envelope encoding.
|
||||
- Application-level auth: DH key exchange + per-message HMAC + AES-128.
|
||||
|
||||
Implementation: hand-rolled [MS-NMF] framing + [MC-NBFX]/[MC-NBFS] binary-XML codec in `mxaccess-asb-nettcp` (workspace-internal, published alongside the rest of the workspace), public surface in `mxaccess-asb`. Cross-platform reach is theoretically possible (no DCOM) but blocked by DPAPI for the shared secret on Windows AVEVA installs.
|
||||
|
||||
```rust
|
||||
pub struct AsbClient { /* TcpStream + SOAP codec + auth + subscription state */ }
|
||||
|
||||
impl AsbClient {
|
||||
pub async fn connect(endpoint: Url, options: AsbConnectionOptions) -> Result<Self, AsbError>;
|
||||
pub async fn register(&mut self, item: &ItemIdentity) -> Result<RegisterResponse, AsbError>;
|
||||
pub async fn read(&mut self, item: &ItemIdentity) -> Result<AsbValue, AsbError>;
|
||||
pub async fn write(
|
||||
&mut self, item: &ItemIdentity, value: AsbValue, opts: WriteOptions,
|
||||
) -> Result<WriteHandle, AsbError>;
|
||||
pub async fn create_subscription(
|
||||
&mut self, opts: SubscriptionOptions,
|
||||
) -> Result<SubscriptionId, AsbError>;
|
||||
pub async fn add_monitored_items(
|
||||
&mut self, sid: SubscriptionId, items: &[MonitoredItem],
|
||||
) -> Result<(), AsbError>;
|
||||
pub async fn publish(&mut self, sid: SubscriptionId) -> Result<Vec<PublishedValue>, AsbError>;
|
||||
pub async fn disconnect(&mut self) -> Result<(), AsbError>;
|
||||
}
|
||||
```
|
||||
|
||||
`AsbClient` is async-native (unlike the .NET reference, which is sync) because Tokio + non-blocking sockets is the natural fit for a long-poll subscription API.
|
||||
|
||||
## What the raw layer does **not** do
|
||||
|
||||
- It does not own a Tokio `Runtime`. It uses one when handed sockets but does not start one.
|
||||
- It does not surface `Stream<Item = DataChange>`. That is an async-layer concern. The raw callback exporter exposes `mpsc::Receiver<CallbackFrame>` of undecoded bytes; the async layer parses them and demultiplexes by correlation.
|
||||
- It does not transform `MxStatus` into typed Rust errors. Status decoding is verbatim; the async layer maps to `Error` types (see `50-error-model.md`).
|
||||
- It does not reconnect or retry. Recovery is an async-layer policy.
|
||||
- It does not expose any sync wrappers. The raw types use `async fn` because every interesting operation is I/O-bound.
|
||||
@@ -0,0 +1,393 @@
|
||||
# Async layer (Tokio)
|
||||
|
||||
The async layer is the public face of the library. Most consumers depend on `mxaccess` (the top-level crate) and never see the raw crates. The crate is async-native, Tokio-based, and exposes idiomatic Rust: typed errors, `Send + Sync` handles, `Stream`s for subscriptions, drop-cancellable subscriptions, `tracing` instrumentation.
|
||||
|
||||
## Public crate: `mxaccess`
|
||||
|
||||
Re-exports the core types from `mxaccess-codec`, hides the raw transports, adds the async session.
|
||||
|
||||
### Connection
|
||||
|
||||
```rust
|
||||
use mxaccess::{Session, ConnectionOptions, Credentials, MxValue};
|
||||
use std::time::SystemTime;
|
||||
|
||||
let session = Session::connect(
|
||||
ConnectionOptions::nmx("localhost")
|
||||
.galaxy_id(1)
|
||||
.platform_id(1)
|
||||
.engine_id(MX_LOCAL_ENGINE)
|
||||
.credentials(Credentials::current_user())
|
||||
.galaxy_db("Server=localhost;Database=Galaxy;Integrated Security=True;TrustServerCertificate=True"),
|
||||
).await?;
|
||||
```
|
||||
|
||||
`ConnectionOptions::nmx(...)` selects `NmxTransport`; `ConnectionOptions::asb(...)` selects `AsbTransport`. Both produce a `Session`.
|
||||
|
||||
`Session` is `Clone + Send + Sync`. Internally it wraps `Arc<SessionInner>` so cloned handles share the same underlying connection.
|
||||
|
||||
**Orderly shutdown — `Session::shutdown(timeout: Duration) -> impl Future<Output = Result<(), Error>>`.** Sends `UnAdvise` for every live subscription, then `UnregisterEngine`, and awaits the connection task's confirmation that those frames have flushed — or returns `Err(Error::Timeout(_))` if the timeout elapses first. This is the recommended exit path for production code and is the async equivalent of the .NET reference's synchronous `Dispose` (src/MxNativeClient/MxNativeSession.cs:476-514).
|
||||
|
||||
Drop of the last `Session` clone is a best-effort fallback: it signals `UnregisterEngine` to the connection task via the same in-process channel that subscription drops use (no `tokio::spawn`, no `block_on`). If the runtime is shut down before the connection task drains, the unregister is lost — see the runtime-shutdown leak note under "Cancellation". Callers that care about deterministic engine deregistration must call `Session::shutdown` rather than relying on drop.
|
||||
|
||||
### Operations
|
||||
|
||||
```rust
|
||||
// Fire-and-forget: returns when the LMX `Write` RPC return is acked.
|
||||
// No `WriteCompleted` callback is awaited.
|
||||
session.write("TestChildObject.TestInt", MxValue::Int32(123)).await?;
|
||||
|
||||
// Awaits the 5-byte `OperationStatus` completion frame. The `client_token`
|
||||
// correlates the wire callback to this call (see writeIndex/clientToken on
|
||||
// `MxNativeSession.WriteAsync`, src/MxNativeClient/MxNativeSession.cs:165-185).
|
||||
session.write_with_completion(
|
||||
"TestChildObject.TestInt",
|
||||
MxValue::Int32(123),
|
||||
/* client_token: */ 0x1001u32,
|
||||
).await?;
|
||||
|
||||
session.write_with_timestamp(
|
||||
"TestChildObject.TestInt",
|
||||
MxValue::Int32(123),
|
||||
SystemTime::now(),
|
||||
).await?;
|
||||
|
||||
// Verified Write — the LMX `WriteSecured` always takes TWO user ids:
|
||||
// `(currentUserId, verifierUserId, value)`. "Single-user secured write" is
|
||||
// callers passing the same id twice; it is NOT a separate API surface.
|
||||
// `WriteSecured2` adds a timestamp; it does NOT add a second token. The
|
||||
// `0x80004021` failure observed in `MxNativeSession.WriteSecuredAsync` is a
|
||||
// defect of the .NET native reimplementation, not a real LMX constraint
|
||||
// (verified against wwtools/mxaccesscli/docs/api-notes.md:60-72,87-95 and
|
||||
// wwtools/mxaccesscli/src/MxAccess.Cli/Commands/WriteCommand.cs:44-101,151-155,196-199;
|
||||
// the LMX proxy CLI exposes `WriteSecured(currentUserId, verifierUserId, value)`
|
||||
// and treats single-user secured writes as `currentUserId == verifierUserId`).
|
||||
session.write_secured(
|
||||
"TestChildObject.TestInt",
|
||||
MxValue::Int32(123),
|
||||
SecurityContext { current_user_id, verifier_user_id },
|
||||
).await?;
|
||||
|
||||
// Timestamped Verified Write — adds a `SystemTime`. Same two-id token shape;
|
||||
// matches `WriteSecured2(currentUserId, verifierUserId, value, timestamp)`.
|
||||
session.write_secured_at(
|
||||
"TestChildObject.TestInt",
|
||||
MxValue::Int32(123),
|
||||
SystemTime::now(),
|
||||
SecurityContext { current_user_id, verifier_user_id },
|
||||
).await?;
|
||||
|
||||
// `read` is implemented as `subscribe + first-result + drop`, mirroring
|
||||
// `MxNativeSession.ReadAsync` (src/MxNativeClient/MxNativeSession.cs:312-359),
|
||||
// which requires a positive timeout and unadvises on completion or timeout.
|
||||
let DataChange { value, status, timestamp, .. } =
|
||||
session.read("TestChildObject.TestInt", Duration::from_secs(5)).await?;
|
||||
```
|
||||
|
||||
All operations take a `&str` reference name (e.g. `"TestObject.Attribute"`) and resolve it to a `MxReferenceHandle` internally via the configured `Resolver`. Default resolver is `mxaccess-galaxy::SqlResolver`; an in-memory resolver is provided for tests (`InMemoryResolver::insert("Tag", metadata)`).
|
||||
|
||||
`Session::write` returns `Ok(())` once the LMX `Write` RPC has been acknowledged at the transport level — it does **not** await a wire `WriteCompleted` frame. Callers that need write-completion semantics must use `Session::write_with_completion(reference, value, client_token)`, which threads `client_token` through the `MxNativeSession.WriteAsync` `clientToken` parameter (src/MxNativeClient/MxNativeSession.cs:165-185) and returns when the matching 5-byte `OperationStatus` callback frame is decoded. See `70-risks-and-open-questions.md` R3/R4 for the cases where the proven stack does not emit a completion frame.
|
||||
|
||||
`Session::read` takes a `Duration` timeout (matching the .NET reference's mandatory `TimeSpan timeout` argument and `ArgumentOutOfRangeException` for non-positive values, src/MxNativeClient/MxNativeSession.cs:312-321). Implementation is `subscribe + first-result + drop`; the drop guard guarantees `UnAdvise` runs on the success, error, and timeout paths so no advise is leaked, mirroring the `finally`/`Unsubscribe` block at src/MxNativeClient/MxNativeSession.cs:351-358.
|
||||
|
||||
### Subscriptions
|
||||
|
||||
```rust
|
||||
use futures::StreamExt;
|
||||
|
||||
let mut subscription = session.subscribe("TestChildObject.TestInt").await?;
|
||||
|
||||
while let Some(change) = subscription.next().await {
|
||||
let change = change?;
|
||||
println!("{} = {:?} @ {:?} (status={:?})",
|
||||
change.reference, change.value, change.timestamp, change.status);
|
||||
}
|
||||
|
||||
// Drop the subscription to unadvise.
|
||||
drop(subscription);
|
||||
```
|
||||
|
||||
`Subscription` implements `Stream<Item = Result<DataChange, Error>>`.
|
||||
|
||||
**Err semantics — non-terminal for parse errors, terminal after connection loss.** The stream yields `Err` items for parse-level failures (the consumer can keep polling — the next inbound frame will be delivered). The stream ends with `None` after a final `Err` for connection-loss / subscription-end events; once `None` is observed, no further items will be yielded. This split mirrors the .NET reference's two events: `CallbackReceived` is raised per-record after a successful parse (src/MxNativeClient/MxNativeSession.cs:603-606), while `UnparsedCallbackReceived` is raised when `NmxSubscriptionMessage.ParseProcessDataReceivedBody` throws — without tearing down other live subscriptions (src/MxNativeClient/MxNativeSession.cs:590-601).
|
||||
|
||||
Consumers wanting strict parity with `UnparsedCallbackReceived` (raw bytes for unparseable frames) can subscribe to `Subscription::raw_callbacks() -> Stream<Item = RawCallback>`, or simply inspect the `Err` variants and keep polling: `Error::Protocol(ProtocolError::Decode { .. })` corresponds to the .NET unparsed-callback path and is non-terminal; `Error::Connection(_)` is terminal and the next `next().await` returns `None`.
|
||||
|
||||
Dropping the subscription sends `UnAdvise` (best-effort, fire-and-forget — see drop semantics below) and removes the correlation from the session's subscription map.
|
||||
|
||||
For batch subscriptions:
|
||||
|
||||
```rust
|
||||
let mut sub = session.subscribe_many(&["A.X", "A.Y", "A.Z"]).await?;
|
||||
while let Some(change) = sub.next().await {
|
||||
let change = change?;
|
||||
// change.reference identifies which of the three
|
||||
}
|
||||
```
|
||||
|
||||
Multi-tag subscriptions multiplex on the same callback channel and demultiplex by correlation ID inside the session task. The wire still issues one `Advise` per tag — the Rust API does not pretend a single advise covers many tags.
|
||||
|
||||
**`subscribe_many` is non-atomic.** The implementation issues one `AdviseSupervisory` per tag in a loop, mirroring `MxNativeSession.SubscribeAsync` which produces a fresh `CorrelationId` and calls `_service.AdviseSupervisory` per tag (src/MxNativeClient/MxNativeSession.cs:250-270). If the Nth advise fails, the first N-1 succeed and remain advised. The error is surfaced through the returned `Result`; the partial set lives on the returned `Subscription`, and the consumer chooses how to recover:
|
||||
- `Subscription::drop` to unadvise the partial set, or
|
||||
- retry the failed tag (e.g. via a follow-up `session.subscribe(failed_tag).await`).
|
||||
|
||||
`subscribe_many_atomic` (an all-or-nothing variant that rolls back on partial failure) is **not** provided in V1 — the proven .NET reference has no atomic equivalent and the wire offers no transactional advise primitive.
|
||||
|
||||
### Buffered subscriptions (NMX only)
|
||||
|
||||
`subscribe_buffered` mirrors the .NET reference's `MxNativeSession.RegisterBufferedItemAsync`, which takes the dual-string `itemDefinition`/`itemContext` split plus an `itemHandle: int` (src/MxNativeClient/MxNativeSession.cs:272-310). The Rust API takes the same parameters explicitly via a `BufferedSubscription` request struct — no convenience overload that hides the `(definition, context, item_handle)` triple is offered, because a consumer that omits any of the three cannot reproduce the captured Frida bodies.
|
||||
|
||||
```rust
|
||||
let mut sub = session.subscribe_buffered(BufferedSubscription {
|
||||
definition: "TestMachine_001.TestHistoryValue",
|
||||
context: "", // optional, may be empty per RegisterBufferedItemAsync:279
|
||||
item_handle: 0x1001, // i32 mapped to NmxReferenceRegistrationMessage.ItemHandle
|
||||
options: BufferedOptions {
|
||||
sample_interval: Duration::from_millis(100),
|
||||
max_queue_size: 1000,
|
||||
},
|
||||
}).await?;
|
||||
|
||||
while let Some(batch) = sub.next().await {
|
||||
for sample in batch?.samples {
|
||||
// sample is a DataChange
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`subscribe_buffered` is gated on the `nmx` feature and returns `Stream<Item = Result<DataChangeBatch, Error>>`. The deployed AVEVA provider may emit single-sample batches even with buffering enabled — see risk R2 in `70-risks-and-open-questions.md`. The API does not synthesise batches; if the wire returns one sample per record, the batch is `samples.len() == 1`.
|
||||
|
||||
### Recovery
|
||||
|
||||
**Recovery is caller-driven, not automatic.** This mirrors the .NET reference: `MxNativeSession.RecoverConnection` and `RecoverConnectionAsync(policy)` are explicit entry points the consumer invokes — the session never auto-starts recovery on heartbeat loss (src/MxNativeClient/MxNativeSession.cs:383-440). The Rust API exposes the same shape:
|
||||
|
||||
```rust
|
||||
// Caller invokes recovery when they choose, with the policy of their choice.
|
||||
session.recover_connection(RecoveryPolicy::exponential(
|
||||
Duration::from_secs(1),
|
||||
/* max_attempts: */ 5,
|
||||
Duration::from_secs(60),
|
||||
)).await?;
|
||||
```
|
||||
|
||||
Heartbeat-loss surfaces as `Error::Connection(...)` on subsequent operations (write/read/subscribe). The caller decides whether to call `recover_connection` based on observed errors and recovery events. There is no implicit recovery thread that resurrects the session in the background.
|
||||
|
||||
**In-flight calls during recovery fail.** While `recover_connection` is running, in-flight writes/reads/subscriptions against the previous transport are not paused, replayed, or migrated — they observe the existing transport being torn down and fail with `Error::Connection(...)`. The .NET reference's `_recoveryActive` is similarly just an inbound-callback annotation flag (src/MxNativeClient/MxNativeSession.cs:444,472); concurrent calls against `_service` are not interlocked. The Rust design **does not promise** "the future resumes on the new connection" — the caller is responsible for retrying after a successful `RecoveryEvent::Completed`.
|
||||
|
||||
```rust
|
||||
let mut events = session.recovery_events();
|
||||
while let Some(ev) = events.next().await {
|
||||
tracing::info!(?ev, "recovery event");
|
||||
}
|
||||
|
||||
pub enum RecoveryEvent {
|
||||
Started { attempt: u32 },
|
||||
Failed { attempt: u32, error: Error, will_retry: bool },
|
||||
Completed { duration: Duration },
|
||||
}
|
||||
```
|
||||
|
||||
The event stream mirrors the .NET reference's `MxNativeSession.RecoveryAttemptStarted/Failed/Completed` events one-for-one (src/MxNativeClient/MxNativeSession.cs:121-123).
|
||||
|
||||
### Cancellation
|
||||
|
||||
Three cancellation surfaces, in order of preference for callers:
|
||||
|
||||
1. **Drop the future or handle.** Dropping a `Subscription` signals `UnAdvise` to the long-lived connection task; dropping a `Session` signals `UnregisterEngine` to the same task. **Drop never spawns a new Tokio task** — instead, `Subscription` holds a `tokio::sync::oneshot::Sender<UnAdviseRequest>` (or equivalent unbounded channel sender), and its `Drop` impl sends a message that the connection task drains in its event loop. Drop is therefore safe outside a runtime context and during runtime shutdown — it does not call `tokio::spawn` from `Drop`.
|
||||
2. **`tokio_util::sync::CancellationToken`.** Long operations (`subscribe_buffered`, recovery, `connect`) accept an optional `CancellationToken` via `*_with_cancellation` variants.
|
||||
3. **Timeout.** `tokio::time::timeout` works on every operation because `async fn`s are cancel-correct by construction.
|
||||
|
||||
**Known runtime-shutdown leak.** If the Tokio runtime is shut down before the connection task has drained pending `UnAdvise`/`UnregisterEngine` messages, those frames are not delivered to the wire. Production code should avoid this by calling `Session::shutdown(timeout).await` (see below) on the orderly-exit path. The .NET reference has the same shape: `Dispose` runs synchronously and calls `_service.UnAdvise(...)` per live subscription before `UnregisterEngine` (src/MxNativeClient/MxNativeSession.cs:483-495). The Rust async equivalent is `Session::shutdown`; relying on `Drop` alone for cleanup is best-effort and documented as such.
|
||||
|
||||
### Error model
|
||||
|
||||
`mxaccess::Error` is a `thiserror`-derived `#[non_exhaustive]` enum. See `50-error-model.md` for the full surface. All operations return `Result<T, Error>`; **no panics in the public surface**.
|
||||
|
||||
A non-Ok `MxStatus` on a returned `DataChange` is data, not an error. A non-Ok status on `read`/`write`/`subscribe`'s synchronous result is an `Err`. This mirrors the .NET reference and is the only sensible split: subscription frames carry status that callers want to inspect ("stale" or "uncertain" is still data); operation results are pass/fail.
|
||||
|
||||
### Threading model
|
||||
|
||||
- Multi-thread Tokio is the default. Single-thread is supported (no `Send`-only future escapes the local thread) but not the recommended deployment.
|
||||
- All public types are `Send + Sync`.
|
||||
- The codec is `Send + Sync` trivially (immutable after parse, owns its bytes).
|
||||
- The session uses `tokio::sync::Mutex` for the per-connection RPC channel state and `tokio::sync::watch` for recovery state. **No `parking_lot::Mutex`** — sync mutexes inside async paths cause hidden blocking.
|
||||
|
||||
### Observability
|
||||
|
||||
`tracing` spans on every public operation: `tracing::instrument` on `register`, `write`, `subscribe`, `read`, `recover`. Span fields: `reference`, `correlation_id`, `transport` (`nmx`|`asb`), `engine_ids`. Span events for state transitions (subscription added, callback received, recovery started, recovery completed).
|
||||
|
||||
Recommended subscriber filter:
|
||||
```
|
||||
mxaccess::session=info,mxaccess::transport=debug
|
||||
```
|
||||
|
||||
Optional `metrics` feature exposes:
|
||||
- counters: `mxaccess_writes_total`, `mxaccess_subscribes_total`, `mxaccess_callbacks_total`, `mxaccess_recoveries_total`
|
||||
- histograms: `mxaccess_operation_latency_seconds{op="write"|"read"|"subscribe"}`
|
||||
|
||||
### Trait `Transport`
|
||||
|
||||
`Transport` uses **native `async fn` in trait** (AFIT, stable in Rust 1.75+) and is **generic-only**. Consumers parameterise sites that take a transport with `impl Transport` or a generic `<T: Transport>` bound. The trait is **not** dyn-compatible — `Box<dyn Transport>` is not supported in V1 — and that limitation is intentional: the design already uses `Session::connect<T: Transport>(...)`-style generic entry points, so giving up `dyn Transport` costs nothing the design currently uses, while keeping zero per-call heap allocation in the hot path. (`#[async_trait]` was the alternative; it allows `dyn Transport` but boxes a `Pin<Box<dyn Future>>` per call — accepted as a known cost only if a future revision needs runtime polymorphism.)
|
||||
|
||||
```rust
|
||||
pub trait Transport: Send + Sync {
|
||||
fn capabilities(&self) -> TransportCapabilities;
|
||||
|
||||
async fn register(&self, options: &ConnectionOptions) -> Result<RegisteredEngine, Error>;
|
||||
async fn unregister(&self, engine: &RegisteredEngine) -> Result<(), Error>;
|
||||
|
||||
async fn write(
|
||||
&self,
|
||||
handle: &MxReferenceHandle,
|
||||
value: &MxValue,
|
||||
opts: WriteOptions,
|
||||
) -> Result<WriteOutcome, Error>;
|
||||
|
||||
async fn advise(
|
||||
&self,
|
||||
handle: &MxReferenceHandle,
|
||||
opts: AdviseOptions,
|
||||
) -> Result<SubscriptionHandle, Error>;
|
||||
|
||||
async fn unadvise(&self, sub: SubscriptionHandle) -> Result<(), Error>;
|
||||
|
||||
fn callbacks(&self) -> CallbackStream;
|
||||
}
|
||||
|
||||
pub struct TransportCapabilities {
|
||||
pub timestamped_writes: bool,
|
||||
pub secured_writes: bool,
|
||||
pub buffered_subscriptions: bool,
|
||||
pub supervisory_advise: bool,
|
||||
pub operation_complete_events: bool,
|
||||
}
|
||||
```
|
||||
|
||||
Two implementations: `NmxTransport` (capabilities mostly true) and `AsbTransport` (capabilities mostly false; see `70-risks-and-open-questions.md`).
|
||||
|
||||
Calling a NMX-only API on an `AsbTransport` returns `Error::Unsupported { operation: Cow<'static, str>, transport: TransportKind }`. The `Cow` is used so the variant accepts both interned `&'static str` literals (the common case) and runtime-formatted operation names without allocation when not required; `TransportKind` is the corresponding `enum TransportKind { Nmx, Asb }` (matching the `transport` span field at line 204). The `Session` may also pre-flight via `transport.capabilities()` to give a better error message before issuing the call.
|
||||
|
||||
### Public surface (re-exports)
|
||||
|
||||
```rust
|
||||
pub use mxaccess_codec::{
|
||||
MxReferenceHandle, MxStatus, MxStatusCategory, MxStatusSource,
|
||||
MxValue, MxValueKind, MxDataType,
|
||||
};
|
||||
|
||||
pub struct Session;
|
||||
pub struct Subscription;
|
||||
|
||||
pub struct DataChange {
|
||||
pub reference: Arc<str>,
|
||||
pub value: MxValue,
|
||||
/// Legacy 16-bit OPC quality (e.g. `0xC0` = 192 = "Good"). Distinct from
|
||||
/// `status: MxStatus` — both are surfaced because real MxAccess
|
||||
/// (`OnDataChange(hServer, hItem, MxDataType, value, quality, timestamp,
|
||||
/// statuses)`) carries them as separate fields. Verified against
|
||||
/// `wwtools/mxaccesscli/docs/api-notes.md:104-105` ("quality on
|
||||
/// OnDataChange is the legacy 16-bit OPC quality value … the richer state
|
||||
/// lives in the statuses[] array") and
|
||||
/// `wwtools/mxaccesscli/src/MxAccess.Cli/Mx/MxUpdate.cs:13-22,39-65`.
|
||||
/// Earlier drafts of this design dropped `quality` as redundant with
|
||||
/// `status`; that was a parity break and has been restored.
|
||||
pub quality: u16,
|
||||
pub timestamp: SystemTime,
|
||||
pub status: MxStatus,
|
||||
}
|
||||
pub struct DataChangeBatch {
|
||||
pub reference: Arc<str>,
|
||||
pub samples: Vec<DataChange>,
|
||||
}
|
||||
// Note on `quality`: `DataChange` carries a 16-bit OPC quality alongside
|
||||
// `status: MxStatus`. They are distinct: `quality` is the legacy wire field
|
||||
// (e.g. `0xC0` = "Good"), preserved for parity with real MxAccess
|
||||
// (`OnDataChange` exposes both). The canonical projection from a wire record
|
||||
// to a typed status is `Record.ToDataChangeStatus()` in the .NET reference
|
||||
// (src/MxNativeClient/MxNativeSession.cs:70), which produces an `MxStatus`.
|
||||
// Consumers that need the historical "quality" view (Good/Uncertain/Bad on
|
||||
// bits 7..6) read it from `status.detail` and `status.category` rather than
|
||||
// from a redundant raw u16. Exposing both invites callers to use the wrong
|
||||
// field; the codec's `MxStatus` is the single source of truth.
|
||||
|
||||
pub struct ConnectionOptions;
|
||||
pub struct WriteOptions;
|
||||
pub struct AdviseOptions;
|
||||
pub struct BufferedOptions;
|
||||
pub struct RecoveryPolicy;
|
||||
pub enum RecoveryEvent { Started { .. }, Failed { .. }, Completed { .. } }
|
||||
|
||||
pub struct Credentials;
|
||||
pub struct SecurityContext;
|
||||
|
||||
pub enum Error; // see 50-error-model.md
|
||||
```
|
||||
|
||||
## `mxaccess-compat` (optional)
|
||||
|
||||
`LMXProxyServer`-shaped methods on top of `Session`. Each method maps one-to-one to a `Session::*` operation:
|
||||
|
||||
```rust
|
||||
let server = mxaccess_compat::Server::new(session);
|
||||
let server_handle = server.register("MyClient");
|
||||
let item_handle = server.add_item(server_handle, "TestObject.TestInt").await?;
|
||||
server.advise(server_handle, item_handle).await?;
|
||||
server.write(server_handle, item_handle, MxValue::Int32(123), user_id).await?;
|
||||
```
|
||||
|
||||
Useful for porting code that depends on the COM API shape; not the recommended consumer surface. Not COM-visible by itself; a separate `mxaccess-compat-com` crate (deferred to post-V1) will register `windows-rs`-generated COM classes that wrap this.
|
||||
|
||||
## Examples
|
||||
|
||||
End-to-end consumer-grade examples in `rust/examples/`:
|
||||
|
||||
- `connect-write-read.rs` — open session, write, read back
|
||||
- `subscribe.rs` — long-running subscription
|
||||
- `subscribe-buffered.rs` — buffered subscription (NMX feature)
|
||||
- `asb-subscribe.rs` — ASB subscription
|
||||
- `recovery.rs` — recovery policy + recovery events
|
||||
- `multi-tag.rs` — `subscribe_many` on a 100-tag set
|
||||
- `secured-write.rs` — `write_secured` (no timestamp) and `write_secured_at` (timestamped), each taking `(current_user_id, verifier_user_id)`; demonstrates both single-user (`current == verifier`) and two-person verification paths
|
||||
|
||||
## Code sample (full)
|
||||
|
||||
```rust
|
||||
use std::time::Duration;
|
||||
use futures::StreamExt;
|
||||
use mxaccess::{Session, ConnectionOptions, Credentials, MxValue, RecoveryPolicy};
|
||||
|
||||
#[tokio::main(flavor = "multi_thread")]
|
||||
async fn main() -> anyhow::Result<()> {
|
||||
tracing_subscriber::fmt::init();
|
||||
|
||||
let mut session = Session::connect(
|
||||
ConnectionOptions::nmx("localhost")
|
||||
.galaxy_id(1)
|
||||
.platform_id(1)
|
||||
.engine_id(420)
|
||||
.credentials(Credentials::current_user())
|
||||
.galaxy_db(std::env::var("MX_GALAXY_DB")?)
|
||||
.recovery(RecoveryPolicy::exponential(
|
||||
Duration::from_secs(1), 5, Duration::from_secs(60),
|
||||
)),
|
||||
).await?;
|
||||
|
||||
session.write("TestChildObject.TestInt", MxValue::Int32(123)).await?;
|
||||
|
||||
let mut sub = session.subscribe("TestChildObject.TestInt").await?;
|
||||
while let Some(change) = sub.next().await {
|
||||
let change = change?;
|
||||
tracing::info!(value = ?change.value, ts = ?change.timestamp, "data change");
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
## What the async layer does **not** do
|
||||
|
||||
- It does not pretend to be sync. There is no `block_on` shortcut in the public API.
|
||||
- It does not support multiple async runtimes. Tokio only.
|
||||
- It does not transmit raw bytes. All operations go through the codec.
|
||||
- It does not retry by default. Recovery is opt-in via `ConnectionOptions::recovery(...)` at session construction. There is no runtime mutator for `RecoveryPolicy`: `Session` is `Clone + Arc<SessionInner>`-backed, so a `&mut self` setter on a clone would not propagate to other clones; the policy is fixed once and shared by every clone. Consumers that need a different policy build a new `Session`.
|
||||
- It does not own a thread pool. It uses Tokio's runtime.
|
||||
- It does not synthesise events the wire does not produce. `WriteCompleted` only fires when the proven 5-byte completion frame is observed; otherwise `RawStatus` is exposed verbatim through `Session::operation_status_events()`. See `70-risks-and-open-questions.md` R3/R4.
|
||||
@@ -0,0 +1,295 @@
|
||||
# Crate topology
|
||||
|
||||
## Workspace layout
|
||||
|
||||
```
|
||||
rust/
|
||||
Cargo.toml workspace root
|
||||
Cargo.lock
|
||||
rust-toolchain.toml 1.85, stable (matches workspace.package.rust-version)
|
||||
crates/
|
||||
mxaccess-codec/ pure protocol codec, no I/O
|
||||
mxaccess-galaxy/ Galaxy SQL resolver (tiberius)
|
||||
mxaccess-rpc/ DCE/RPC + NTLMv2 + OXID + OBJREF
|
||||
mxaccess-callback/ INmxSvcCallback RPC server
|
||||
mxaccess-nmx/ INmxService2 client
|
||||
mxaccess-asb-nettcp/ net.tcp framing: MC-NMF + MC-NBFX/NBFS binary message encoder
|
||||
(NetTcpBinding default — see src/MxAsbClient/MxAsbDataClient.cs:660-685)
|
||||
mxaccess-asb/ IASBIDataV2 client
|
||||
mxaccess/ async session + Transport trait + public API
|
||||
examples/ `cargo run --example` only resolves examples
|
||||
connect-write-read.rs owned by a specific crate, so the public-facing
|
||||
subscribe.rs examples live under the top-level `mxaccess`
|
||||
subscribe-buffered.rs crate and are invoked with `-p mxaccess`.
|
||||
asb-subscribe.rs
|
||||
recovery.rs
|
||||
multi-tag.rs
|
||||
secured-write.rs
|
||||
mxaccess-compat/ LMXProxyServer-shaped facade (optional)
|
||||
tests/
|
||||
fixtures/ copy of ../captures/0NN-frida-* (junctions are Windows-only and don't survive `git clone` cross-platform; symlinks need Developer Mode on Windows — copy is the portable default)
|
||||
```
|
||||
|
||||
The workspace lives at `c:\Users\dohertj2\Desktop\mxaccess\rust\` (sibling of `src/`) per the `CLAUDE.md` directive. The .NET tooling does not look there; cargo treats it as the workspace root.
|
||||
|
||||
## Dependency graph
|
||||
|
||||
```
|
||||
+----------------+
|
||||
| mxaccess-codec |
|
||||
+----------------+
|
||||
^ ^
|
||||
| |
|
||||
+-----------+ +-----------+
|
||||
| |
|
||||
+---------------+ +-------------------+
|
||||
| mxaccess-rpc | | mxaccess-asb-nettcp |
|
||||
+---------------+ +-------------------+
|
||||
^ ^
|
||||
| |
|
||||
+-------+---------+ +-------+--------+
|
||||
| mxaccess-callback| | mxaccess-asb |
|
||||
+------------------+ +----------------+
|
||||
^ ^
|
||||
| |
|
||||
+-------+----------+ |
|
||||
| mxaccess-nmx | |
|
||||
+------------------+ |
|
||||
^ +-------------------+ |
|
||||
| | mxaccess-galaxy | |
|
||||
| +-------------------+ |
|
||||
| ^ |
|
||||
+--------------+-------------------+
|
||||
|
|
||||
+---------------+
|
||||
| mxaccess | (top-level async API)
|
||||
+---------------+
|
||||
^
|
||||
|
|
||||
+-----------------+
|
||||
| mxaccess-compat | (LMXProxyServer shape)
|
||||
+-----------------+
|
||||
```
|
||||
|
||||
No cycles. ASB and NMX paths never depend on each other. The `mxaccess-nmx → mxaccess-galaxy` arrow is feature-gated behind `galaxy-resolver` (default-on); consumers building NMX with a custom `Resolver` can drop it.
|
||||
|
||||
## Per-crate detail
|
||||
|
||||
### `mxaccess-codec`
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Role | Pure encoder/decoder for NMX wire types and ASB variant |
|
||||
| Targets | All Rust targets *theoretically* (codec is pure, no platform-bound deps); Linux/macOS support is a **stretch goal**, gated on `MX_LIVE` integration tests against a remote AVEVA install. See `60-roadmap.md` and `70-risks-and-open-questions.md` Q3. |
|
||||
| Public deps | `bytes`, `byteorder`, `uuid`, `widestring`, `thiserror` |
|
||||
| Private deps | `proptest` (dev) |
|
||||
| Optional features | `serde` (derives `Serialize`/`Deserialize` on public types) |
|
||||
| Tests | Round-trip every captured fixture; proptest generators for primitives; cross-implementation parity vs `dotnet run --project src\MxNativeCodec.Tests` |
|
||||
|
||||
### `mxaccess-galaxy`
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Role | Galaxy Repository SQL resolver: tag → metadata, user → identity |
|
||||
| Targets | All Rust targets, but Linux integrated-security is a stretch goal — see auth note below |
|
||||
| Deps | `mxaccess-codec`, `tokio`, `tokio-util`, `futures-util`, `thiserror`, `tracing` (`tiberius` is now an optional dep, see below) |
|
||||
| Optional features | `galaxy-resolver` (default-off; pulls `tiberius` and exposes the SQL-backed resolver. Consumers that only need NMX/ASB with a custom `Resolver` impl can leave this off and avoid pulling TDS, native-tls/rustls, and the `winauth` stack). `auth-windows` (default-on for Windows when `galaxy-resolver` is on; selects `tiberius`'s `winauth` SSPI feature for integrated security against domain-joined SQL Server). On Linux, `auth-windows` does **not** apply: integrated security against an MSSQL Galaxy DB requires `tiberius`'s `integrated-auth-gssapi` feature plus a configured Kerberos KDC and `krb5.conf` on the client. Galaxy databases in practice are domain-joined Windows boxes using NTLM/Kerberos integrated auth, so Linux clients without an MIT/Heimdal stack will fail to authenticate; flagged as a **stretch goal** and tracked in `70-risks-and-open-questions.md`. SQL-login fallback is always available cross-platform. |
|
||||
| Tests | Mock SQL fixtures; live integration test gated on `MX_GALAXY_DB` env var |
|
||||
|
||||
The .NET reference keeps `GalaxyRepositoryTagResolver.cs` inside the `MxNativeClient` namespace (`src/MxNativeClient/GalaxyRepositoryTagResolver.cs:4`). Splitting it into `mxaccess-galaxy` is a Rust-side improvement, not a porting fault: the resolver's only inputs are SQL connection options, its only output is `MxReferenceHandle` (a `mxaccess-codec` type), and the Rust trait `Resolver` is exposed by `mxaccess-nmx` so consumers can plug in their own implementation. With `galaxy-resolver` feature-gated, `mxaccess-nmx` does not transitively pull `tiberius` for consumers who do not need it.
|
||||
|
||||
**Resolver input contract — `tag_name`-form only.** The Galaxy DB carries two distinct name fields per object: `tag_name` (the runtime read/write name, e.g. `DelmiaReceiver_001`) and `contained_name` (the hierarchy-browsing path, e.g. `TestMachine_001.DelmiaReceiver`). These are **asymmetric and cannot be used interchangeably** — `wwtools/grdb/README.md` calls this out as a critical distinction. `GalaxyRepositoryTagResolver.ResolveSql` keys on `g.tag_name = @objectTagName`; passing a contained-name will silently miss. The Rust `Resolver` trait takes a `tag_name`-form `&str` (e.g. `"TestObject.TestInt"` resolves the `TestObject` tag plus the `TestInt` attribute on it). If a future consumer needs contained-name → tag-name translation, add it as a separate translator that calls `wwtools/grdb/queries/hierarchy.sql`-style logic; **do not** mix the two paths inside `mxaccess-galaxy`.
|
||||
|
||||
**Galaxy schema version probe.** R10 in `70-risks-and-open-questions.md` flags older Galaxy schema layouts as untested. `wwtools/grdb/` confirms a `dbo.schema_version` table exists with `version_number` / `version_string` / `cdi_version` columns. The Rust resolver should query this at session construction and fail loud (`ConfigError::Galaxy { reason: format!("schema version {version_string} is outside tested range") }`) if the version is outside the proven set.
|
||||
|
||||
### `mxaccess-rpc`
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Role | DCE/RPC PDU codec + NTLMv2 + OBJREF + OXID resolution + RemQI |
|
||||
| Targets | `x86_64-pc-windows-msvc` (primary), `x86_64-pc-windows-gnu`, `x86_64-unknown-linux-gnu` (NTLM-only paths) |
|
||||
| Deps | `mxaccess-codec`, `bytes`, `byteorder`, `tokio`, `hmac`, `md-5`, `rc4`, `rand`, `uuid`, `thiserror`, `tracing` (all crypto crates pinned to the `digest 0.11`/`cipher 0.5` generation per the workspace dependency table) |
|
||||
| Optional features | `windows-com` (default-on Windows; pulls `windows` for `GUID`/`ObjRef` helpers) |
|
||||
| Tests | Unit tests for NTLM message construction + PDU framing; integration tests against captured ObjectExporter responses |
|
||||
|
||||
### `mxaccess-callback`
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Role | TCP listener + RPC server for `INmxSvcCallback` and `IRemUnknown` |
|
||||
| Deps | `mxaccess-rpc`, `mxaccess-codec`, `tokio`, `futures-util`, `tracing`, `thiserror` |
|
||||
| Tests | Unit test that exercises the dispatch table with synthetic Bind/Request PDUs |
|
||||
|
||||
### `mxaccess-nmx`
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Role | `INmxService2` client + raw NMX session façade. Exposes a `Resolver` trait so consumers can plug in any tag-handle resolver. |
|
||||
| Deps | `mxaccess-codec`, `mxaccess-rpc`, `mxaccess-callback`, `tokio`, `tracing`, `thiserror` |
|
||||
| Optional features | `galaxy-resolver` (default-on; pulls `mxaccess-galaxy` and re-exports its SQL-backed `Resolver` impl. Off → `mxaccess-nmx` builds without `tiberius`/TDS, and the consumer supplies their own `Resolver`.) |
|
||||
| Tests | Round-trip TransferData fixtures; live probe gated on `MX_LIVE` |
|
||||
|
||||
### `mxaccess-asb-nettcp`
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Role | net.tcp framing layer. Implements MC-NMF (.NET Message Framing) + MC-NBFX/NBFS (.NET Binary XML / dictionary string table) — the default binary message encoder for `NetTcpBinding`. Reference WCF construction in `src/MxAsbClient/MxAsbDataClient.cs:660-685` is `new NetTcpBinding(SecurityMode.None)` with no encoder override, which selects `BinaryMessageEncodingBindingElement` by default — i.e. *not* SOAP/XML on the wire. The previous name `mxaccess-asb-soap` was a misnomer. |
|
||||
| Visibility | Workspace-internal crate, published alongside the rest of the workspace (no `publish = false` — Cargo refuses `cargo publish` of `mxaccess-asb` if a path-dep here lacks a published version). |
|
||||
| Deps | `bytes`, `tokio`, `[an MC-NBFX/NBFS impl — TODO: evaluate `wcf-binary` crate or hand-roll a dictionary-table codec]`, `quick-xml` (only for the small ASB control-plane XML payloads such as `request.ToXml()` at `src/MxAsbClient/AsbSystemAuthenticator.cs:79`, *not* for net.tcp framing), `flate2`, `aes`, `hmac`, `md-5`, `sha1` (note crate rename — `sha-1` is deprecated upstream, `sha1` is the maintained successor), `sha2`, `pbkdf2`, `num-bigint`, `rand`, `tracing`. All RustCrypto crates are pinned to the `digest 0.11`/`cipher 0.5` generation; see workspace `[workspace.dependencies]` table. |
|
||||
| Tests | Unit tests for net.tcp/MC-NMF framing + DH handshake against captured payloads from `AsbMessageDumpBehavior` |
|
||||
|
||||
### `mxaccess-asb`
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Role | `IASBIDataV2` client |
|
||||
| Deps | `mxaccess-codec`, `mxaccess-asb-nettcp`, `tokio`, `tracing`, `thiserror` |
|
||||
| Optional features | `dpapi` (default-on for Windows targets) — see `SecretProvider` below |
|
||||
| Tests | Round-trip Variant fixtures; live probe gated on `MX_LIVE` |
|
||||
|
||||
`mxaccess-asb` always exposes a `SecretProvider` trait (single fallible `async fn fetch(&self) -> Result<Zeroizing<Vec<u8>>, Error>`) that the ASB authenticator calls to obtain the shared secret used for the DH-passphrase derivation (`src/MxAsbClient/AsbSystemAuthenticator.cs:28, 134-142` — the secret is mandatory for the handshake; without it ASB cannot authenticate). The trait is **always present** — not feature-gated — so consumers can plug in any source (env var, file, KeyVault, hardcoded test fixture). The `dpapi` feature provides a default Windows-only implementation that reads the secret via `windows::Win32::Security::Cryptography::CryptUnprotectData`. With `dpapi=off`, the crate still compiles and works; the consumer must provide a `SecretProvider` impl explicitly, otherwise `Session::builder()` fails at construction time with `Error::ConfigurationIncomplete { missing: "secret_provider" }`.
|
||||
|
||||
### `mxaccess`
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Role | Public async API: `Session`, `Subscription`, `Transport` trait |
|
||||
| Deps | `mxaccess-codec`, `tokio`, `tokio-util`, `futures-util`, `tracing`, `thiserror`, `async-trait`, `arc-swap` (for cheap clones of session state) |
|
||||
| Optional features | `nmx` (default-on Windows; pulls `mxaccess-nmx`), `asb` (default-on; pulls `mxaccess-asb`), `metrics` (optional `metrics` instrumentation), `serde` (forwards to codec) |
|
||||
| Tests | Integration tests gated on env vars; in-memory `Transport` for unit tests |
|
||||
|
||||
### `mxaccess-compat`
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Role | `LMXProxyServer`-shaped methods on top of `Session` |
|
||||
| Deps | `mxaccess` |
|
||||
| Tests | Method-equivalence tests against captured `MxNativeCompatibilityServer` outputs |
|
||||
|
||||
## Build / test commands
|
||||
|
||||
To be added to `CLAUDE.md` "Common commands" once `rust/` exists:
|
||||
|
||||
```powershell
|
||||
# Workspace-wide
|
||||
cargo build --workspace
|
||||
cargo test --workspace
|
||||
cargo clippy --workspace -- -D warnings
|
||||
cargo fmt --check
|
||||
|
||||
# Single crate
|
||||
cargo build -p mxaccess-codec
|
||||
cargo test -p mxaccess-codec
|
||||
|
||||
# Live integration tests (require AVEVA install + Galaxy DB)
|
||||
$env:MX_LIVE = "1"
|
||||
$env:MX_GALAXY_DB = "Server=localhost;Database=Galaxy;Integrated Security=True;TrustServerCertificate=True"
|
||||
$env:MX_NMX_HOST = "localhost"
|
||||
cargo test -p mxaccess --features live -- --ignored
|
||||
|
||||
# Examples (live under `crates/mxaccess/examples/`; `-p mxaccess` is required because
|
||||
# `cargo run --example` only resolves examples that belong to a specific crate)
|
||||
cargo run -p mxaccess --example connect-write-read
|
||||
cargo run -p mxaccess --example subscribe -- --tag TestChildObject.TestInt
|
||||
cargo run -p mxaccess --example asb-subscribe -- --tag TestChildObject.TestInt
|
||||
```
|
||||
|
||||
## Toolchain & MSRV
|
||||
|
||||
- MSRV is **1.85**, set both in `rust-toolchain.toml` and in `[workspace.package].rust-version`. Both must stay in lock-step; CI fails if they drift. 1.85 is the floor required by the pinned RustCrypto generation (`digest 0.11` / `cipher 0.5` family — `aes 0.9`, `hmac 0.13`, `md-5 0.11`, `sha1 0.11`, `pbkdf2 0.13`) and by the latest `uuid 1.x`; lowering it forces an older crypto generation that conflicts on the resolved `digest`/`cipher` traits.
|
||||
- Edition **2024** (stable since Rust 1.85, 2025-02). Since MSRV is already 1.85, edition 2024 is a free upgrade.
|
||||
- `rustfmt` default config + 100-column lines committed.
|
||||
- `clippy` with `-D warnings` in CI.
|
||||
- `clippy::unwrap_used`, `clippy::expect_used` set to deny in `mxaccess`, `mxaccess-codec`, `mxaccess-rpc`, `mxaccess-nmx`, `mxaccess-asb`. Allowed in tests via `#[cfg(test)]` overrides. UTF-16LE name decoding in `mxaccess-codec` (`MxReferenceHandle` parsing) must use a fallible helper that maps `String::from_utf16` errors into a typed codec error rather than `.unwrap()`-ing — there is no panicking decode path on the hot wire-parse surface.
|
||||
|
||||
## Feature gates summary
|
||||
|
||||
| Feature | Default | Crate | Effect |
|
||||
|---|---|---|---|
|
||||
| `nmx` | yes (Windows) | `mxaccess` | Enables `NmxTransport`, pulls `mxaccess-nmx` |
|
||||
| `asb` | yes | `mxaccess` | Enables `AsbTransport`, pulls `mxaccess-asb` |
|
||||
| `metrics` | no | `mxaccess` | Emits `metrics` counters/histograms |
|
||||
| `serde` | no | `mxaccess`, `mxaccess-codec` | Derives `Serialize`/`Deserialize` |
|
||||
| `dpapi` | yes (Windows) | `mxaccess-asb` | Provides the Windows DPAPI default `SecretProvider` impl. Off → consumer must supply their own `SecretProvider` (the trait is always present). |
|
||||
| `galaxy-resolver` | yes | `mxaccess-nmx` | Pulls `mxaccess-galaxy` and exposes the SQL-backed `Resolver`. Off → `mxaccess-nmx` ships without `tiberius`/TDS; consumer supplies a custom `Resolver`. |
|
||||
| `auth-windows` | yes (Windows) | `mxaccess-galaxy` | Integrated security (SSPI/`winauth`) for SQL Server. Windows-only; Linux integrated security is a separate stretch goal that requires `tiberius`'s `integrated-auth-gssapi` feature + a configured Kerberos KDC. SQL-login fallback works cross-platform without this feature. |
|
||||
| `windows-com` | yes (Windows) | `mxaccess-rpc` | Uses `windows` crate for GUID/IID helpers |
|
||||
| `live` (test-only) | no | `mxaccess`, `mxaccess-nmx`, `mxaccess-asb` | Enables tests that hit a live AVEVA install |
|
||||
|
||||
## Workspace `Cargo.toml` skeleton
|
||||
|
||||
```toml
|
||||
[workspace]
|
||||
resolver = "2"
|
||||
members = [
|
||||
"crates/mxaccess-codec",
|
||||
"crates/mxaccess-galaxy",
|
||||
"crates/mxaccess-rpc",
|
||||
"crates/mxaccess-callback",
|
||||
"crates/mxaccess-nmx",
|
||||
"crates/mxaccess-asb-nettcp",
|
||||
"crates/mxaccess-asb",
|
||||
"crates/mxaccess",
|
||||
"crates/mxaccess-compat",
|
||||
]
|
||||
|
||||
[workspace.package]
|
||||
edition = "2024"
|
||||
license = "MIT" # resolved 2026-05-05; LICENSE at repo root
|
||||
repository = "https://github.com/<org>/mxaccess"
|
||||
rust-version = "1.85"
|
||||
|
||||
[workspace.dependencies]
|
||||
bytes = "1"
|
||||
byteorder = "1"
|
||||
uuid = { version = "1", features = ["v4", "v7"] }
|
||||
widestring = "1"
|
||||
thiserror = "1"
|
||||
tokio = { version = "1", features = ["net", "io-util", "rt-multi-thread", "sync", "time", "macros"] }
|
||||
tokio-util = { version = "0.7", features = ["codec"] }
|
||||
futures-util = "0.3"
|
||||
tracing = "0.1"
|
||||
async-trait = "0.1"
|
||||
# RustCrypto generation: digest 0.11 / cipher 0.5 line. All crates here are pinned to that
|
||||
# generation so the resolved `digest` / `cipher` graph is coherent. Bumping any one of these
|
||||
# to the older 0.10/0.12 line will fail to build — pin the generation, not the individual
|
||||
# versions. This generation requires `rust-version = "1.85"` (set below).
|
||||
hmac = "0.13"
|
||||
md-5 = "0.11"
|
||||
sha1 = "0.11" # crate renamed from `sha-1` (deprecated) to `sha1` upstream
|
||||
sha2 = "0.11"
|
||||
rc4 = "0.2" # latest published; on the cipher 0.5 trait reform.
|
||||
rand = "0.8"
|
||||
quick-xml = "0.36" # ASB control-plane XML payloads only (e.g. `request.ToXml()` at
|
||||
# `src/MxAsbClient/AsbSystemAuthenticator.cs:79`); not used for
|
||||
# net.tcp wire framing.
|
||||
aes = "0.9"
|
||||
flate2 = "1"
|
||||
pbkdf2 = "0.13"
|
||||
num-bigint = "0.4" # NOTE: review.md [MAJOR] flags this as not constant-time. The DH
|
||||
# private exponent is long-lived (`AsbSystemAuthenticator.cs:153-166`),
|
||||
# so a side-channel-leaky `mod_exp` is a security regression vs. an
|
||||
# opportunity. Tracked as an explicit follow-up in `70-risks-and-open-questions.md`
|
||||
# to swap to `crypto-bigint` constant-time `mod_exp` once the wire
|
||||
# round-trips against captured DH handshakes.
|
||||
tiberius = { version = "0.12", features = ["chrono", "tds73"] }
|
||||
# `windows` 0.62 is the current line; 0.58 → 0.62 has breaking renames in
|
||||
# `Win32_System_Rpc` and `Win32_Security_Cryptography` so designing against an older
|
||||
# pin wastes work.
|
||||
windows = { version = "0.62", features = [
|
||||
"Win32_Foundation",
|
||||
"Win32_System_Com",
|
||||
"Win32_Security_Cryptography",
|
||||
"Win32_System_Rpc",
|
||||
] }
|
||||
proptest = "1"
|
||||
metrics = "0.23"
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
arc-swap = "1"
|
||||
```
|
||||
|
||||
Pin minor versions in CI; keep workspace-level dependency table consistent across crates.
|
||||
|
||||
## License
|
||||
|
||||
**MIT** (resolved 2026-05-05; see `70-risks-and-open-questions.md` Q2). `LICENSE` file lives at the project root (`c:\Users\dohertj2\Desktop\mxaccess\LICENSE`). All crate `Cargo.toml`s inherit `license = "MIT"` via `workspace.package` so each crate publishes correctly. Workspace deps listed above are MIT/Apache-2.0 compatible; MIT alone satisfies every dep's downstream license obligation. The `windows` crate proxy/stub IDL re-emissions are flagged for legal review only if vendored from Microsoft headers — not applicable to typical `windows-rs`-generated code.
|
||||
@@ -0,0 +1,357 @@
|
||||
# Protocol invariants — bill of materials
|
||||
|
||||
This is the wire-level spec the Rust port must hit byte-for-byte. Every entry cites its evidence in `src/`, `docs/`, `analysis/`, or `captures/`.
|
||||
|
||||
## COM identifiers
|
||||
|
||||
| Name | GUID | Source |
|
||||
|---|---|---|
|
||||
| `NmxServiceClass` (CLSID) | `AE24BD51-2E80-44CC-905B-E5446C942BEB` | `src/MxNativeClient/NmxComContracts.cs:7` |
|
||||
| `INmxService` (IID) | `575008DB-845D-46C6-A906-F6F8CA86F315` | `src/MxNativeClient/NmxComContracts.cs:24` |
|
||||
| `INmxService2` (IID) | `2630A513-A974-4B1A-8025-457A9A7C56B8` | `src/MxNativeClient/NmxComContracts.cs:51` |
|
||||
| `INmxSvcCallback` (IID) | `B49F92F7-C748-4169-8ECA-A0670B012746` | `src/MxNativeClient/NmxComContracts.cs:84` |
|
||||
| DCE/RPC bind context UUID (initial bind, opnum 0 calls) | `4e0c90df-e39d-4164-a421-ace89484c602` | `docs/Loopback-Protocol-Findings.md:63` |
|
||||
| DCE/RPC service UUID (altered context, main opnums 0/2/3/5) | `1981974b-6bf7-46cb-9640-0260bbb551ba` | `docs/Loopback-Protocol-Findings.md:64` |
|
||||
| Standard NDR transfer syntax v2.0 | `8a885d04-1ceb-11c9-9fe8-08002b104860` | [MS-RPCE] §14.3 |
|
||||
|
||||
## INmxService2 opnums (after IUnknown's 0/1/2)
|
||||
|
||||
`INmxService2` inherits from `INmxService`. Opnums are sequential across the inheritance.
|
||||
|
||||
In the IDL/COM proxy these opnums are sequential because `INmxService2` extends `INmxService` and the derived interface continues the same vtable. In the .NET interop interface (`src/MxNativeClient/NmxComContracts.cs:50–80`) the methods are re-declared with the `new` modifier (`new void RegisterEngine(...)`, `new void UnRegisterEngine(...)`, etc.) so the managed `INmxService2` carries its own vtable slots distinct from the base interface — that managed shadowing is a C# interop detail and does **not** affect the wire opnum table. The Rust port targets the IDL/wire opnums (3..11 below), not the .NET interop vtable.
|
||||
|
||||
| Opnum | Method | Inputs | Outputs |
|
||||
|---|---|---|---|
|
||||
| 3 | `RegisterEngine` | engineId(i32), engineName(BSTR), callback(*INmxSvcCallback) | hresult |
|
||||
| 4 | `UnRegisterEngine` | engineId(i32) | hresult |
|
||||
| 5 | `Connect` | localEngineId(i32), remoteGalaxyId(i32), remotePlatformId(i32), remoteEngineId(i32) | hresult |
|
||||
| 6 | `TransferData` | remoteGalaxyId(i32), remotePlatformId(i32), remoteEngineId(i32), size(i32), messageBody(byte[size]) | hresult |
|
||||
| 7 | `AddSubscriberEngine` | localEngineId(i32), subscriberGalaxyId(i32), subscriberPlatformId(i32), subscriberEngineId(i32) | hresult |
|
||||
| 8 | `RemoveSubscriberEngine` | same as Add | hresult |
|
||||
| 9 | `SetHeartbeatSendInterval` | ticksPerBeat(i32), maxMissedTicks(i32) | hresult |
|
||||
| 10 | `RegisterEngine2` | engineId(i32), engineName(BSTR), version(i32), callback(*INmxSvcCallback) | hresult |
|
||||
| 11 | `GetPartnerVersion` | galaxyId(i32), platformId(i32), engineId(i32) | hresult, version(out i32) |
|
||||
|
||||
Source: `src/MxNativeClient/NmxComContracts.cs:11–80`.
|
||||
|
||||
## INmxSvcCallback opnums
|
||||
|
||||
| Opnum | Method | Inputs | Outputs |
|
||||
|---|---|---|---|
|
||||
| 3 | `DataReceived` | bufferSize(i32), dataBuffer(sbyte[bufferSize]) | hresult |
|
||||
| 4 | `StatusReceived` | bufferSize(i32), statusBuffer(sbyte[bufferSize]) | hresult |
|
||||
|
||||
Source: `src/MxNativeClient/NmxComContracts.cs:85–92`. Method names match the MIDL signatures at `src/MxNativeClient/NmxSvcCallbackMessages.cs:11-12` and `src/MxNativeClient/NmxProcedureMetadata.cs:89-101` exactly — the `Raw` suffix used in earlier drafts was doc-invented and has been removed.
|
||||
|
||||
## Network ports
|
||||
|
||||
| Endpoint | Port | Notes |
|
||||
|---|---|---|
|
||||
| `IObjectExporter` (RPCSS endpoint mapper) | 135/tcp | DCE/RPC over TCP |
|
||||
| `NmxSvc.exe` static endpoint registration | 5026/tcp+udp | Registered with RPCSS; actual listening socket resolved via OXID at runtime (observed dynamic ports e.g. 49704). |
|
||||
| Callback server (`mxaccess-callback`) | ephemeral | Embedded in OBJREF dual-string |
|
||||
| ASB endpoint | configurable, typical `net.tcp://host:5021/...` | Read from `HKLM\ArchestrA\ArchestrAServices\{Solution}` |
|
||||
|
||||
## NMX TransferData envelope (46 bytes)
|
||||
|
||||
| Offset | Size | Field | Encoding |
|
||||
|---|---|---|---|
|
||||
| 0 | 2 | Version | u16 LE = `1` |
|
||||
| 2 | 4 | InnerLength | i32 LE = `body.len() - 46` |
|
||||
| 6 | 4 | Reserved | 4 bytes. **Not preserved by the .NET reference**: `Parse` skips bytes 6..10 and `Encode` always writes `0` there (`src/MxNativeCodec/NmxTransferEnvelope.cs:39–75, 91`). The Rust port intentionally **adds** preservation by carrying these four bytes as `reserved6_10: [u8; 4]` through parse/encode (default `[0; 4]` for new envelopes), per the `10-raw-layer.md` envelope section — this fixes a CLAUDE.md "preserve unknown bytes" gap that the .NET reference does not. |
|
||||
| 10 | 4 | MessageKind | i32 LE: 1=Metadata, 2=ItemControl, 3=Write |
|
||||
| 14 | 4 | SourceGalaxyId | i32 LE |
|
||||
| 18 | 4 | SourcePlatformId | i32 LE |
|
||||
| 22 | 4 | LocalEngineId | i32 LE |
|
||||
| 26 | 4 | TargetGalaxyId | i32 LE |
|
||||
| 30 | 4 | TargetPlatformId | i32 LE |
|
||||
| 34 | 4 | TargetEngineId | i32 LE |
|
||||
| 38 | 4 | ProtocolMarker | i32 LE = `0x0000_0201`; on-wire byte sequence `01 02 00 00` (low byte first). Encoded by `BinaryPrimitives.WriteInt32LittleEndian(... ProtocolMarker)` (`src/MxNativeCodec/NmxTransferEnvelope.cs:99`). |
|
||||
| 42 | 4 | TimeoutMilliseconds | i32 LE (default `30000`) |
|
||||
|
||||
Source: `src/MxNativeCodec/NmxTransferEnvelope.cs:5–104`.
|
||||
|
||||
⚠ **InnerLength must match actual body size.** The native adapter logs `NMX Header ... buffer size pktHeader.dwDataSize N doesn't match received message size of 46` (`work_remain.md:74–85`) when it does not. The encoder validates the relationship before transmitting; envelope-only sends with `InnerLength == 0` are rejected unless explicitly opted into for diagnostics.
|
||||
|
||||
## MxReferenceHandle (20 bytes)
|
||||
|
||||
| Offset | Size | Field | Encoding |
|
||||
|---|---|---|---|
|
||||
| 0 | 1 | GalaxyId | u8 |
|
||||
| 1 | 1 | Reserved | u8 = `0` |
|
||||
| 2 | 2 | PlatformId | u16 LE |
|
||||
| 4 | 2 | EngineId | u16 LE |
|
||||
| 6 | 2 | ObjectId | u16 LE |
|
||||
| 8 | 2 | ObjectSignature | u16 LE = CRC-16/IBM(lowercase UTF-16LE objectTagName) |
|
||||
| 10 | 2 | PrimitiveId | i16 LE |
|
||||
| 12 | 2 | AttributeId | i16 LE |
|
||||
| 14 | 2 | PropertyId | i16 LE |
|
||||
| 16 | 2 | AttributeSignature | u16 LE = CRC-16/IBM(lowercase UTF-16LE attributeName) |
|
||||
| 18 | 2 | AttributeIndex | i16 LE = `-1` if array, `0` otherwise |
|
||||
|
||||
CRC-16/IBM polynomial `0xa001` (right-shifted variant). **Initial value `0`** (`src/MxNativeCodec/MxReferenceHandle.cs:51` — literal `ushort crc = 0`, not `0xFFFF`). For each `char` of `name.ToLowerInvariant()`, apply the low byte then the high byte of the UTF-16LE representation (`src/MxNativeCodec/MxReferenceHandle.cs:52–56`).
|
||||
|
||||
Source: `src/MxNativeCodec/MxReferenceHandle.cs:5–120`. Per-char loop at `MxReferenceHandle.cs:47–59`; inner CRC byte step at `MxReferenceHandle.cs:108–119`.
|
||||
|
||||
## Item-control bodies
|
||||
|
||||
| Command | Opcode | Length |
|
||||
|---|---|---|
|
||||
| AdviseSupervisory | `0x1f` | 39 bytes (HeaderLength 3 + GUID 16 + AdviseExtra 2 + Payload 18) |
|
||||
| UnAdvise | `0x21` | 37 bytes (HeaderLength 3 + GUID 16 + Payload 18) |
|
||||
|
||||
The enum defines `Advise = 0x1f` and `AdviseSupervisory = 0x1f` as the same opcode (`src/MxNativeCodec/NmxItemControlMessage.cs:7–8`), but `NmxItemControlMessage.Parse` only accepts `AdviseSupervisory` or `UnAdvise` and rejects any other command byte (`src/MxNativeCodec/NmxItemControlMessage.cs:46–49`). A 37-byte `0x1f` "plain advise" body is **not** a wire shape the codec produces or accepts. The compatibility layer's `AdviseSupervisory` method forwards to `Advise`, both encoded as the 39-byte AdviseSupervisory body (`src/MxNativeClient/MxNativeCompatibilityServer.cs:256–258`).
|
||||
|
||||
AdviseSupervisory layout: `cmd(1) + version u16(2) + correlation(GUID 16) + adviseExtra(2) + handle_projection(14, bytes 6..19 of MxReferenceHandle) + tail u32(4)` (`src/MxNativeCodec/NmxItemControlMessage.cs:25–35,121–142`). The 2-byte `adviseExtra` is omitted for `UnAdvise`.
|
||||
|
||||
**`tail` constant value: `3` (u32 LE = `03 00 00 00`).** `NmxItemControlMessage.FromReferenceHandle` defaults the parameter to `tail = 3` (`src/MxNativeCodec/NmxItemControlMessage.cs:88`) and every call site in the .NET reference relies on that default. The Rust port must emit the literal value `3` for both `AdviseSupervisory` and `UnAdvise`; emitting any other value will be rejected by the responding NMX.
|
||||
|
||||
Source: `src/MxNativeCodec/NmxItemControlMessage.cs:5–154`.
|
||||
|
||||
## Write bodies (`0x37`, normal)
|
||||
|
||||
Common prefix (18 bytes): `cmd(1=0x37) + version u16(2=1) + handle_projection(14) + wireKind(1)` (`src/MxNativeCodec/NmxWriteMessage.cs:11–13,207–213`). `HandleProjectionOffset = 3`, `HandleProjectionLength = 14`, `KindOffset = 17`. There is **no** padding between version and the handle projection; the handle projection is bytes 6..19 of the 20-byte `MxReferenceHandle` written directly at offset 3.
|
||||
|
||||
Normal scalar suffix (14 bytes + writeIndex): `[-1 i16] + filler(8 zero bytes) + clientToken u32(4) + writeIndex i32(4)` (`src/MxNativeCodec/NmxWriteMessage.cs:215–226`).
|
||||
|
||||
Boolean has its own 11-byte suffix instead: `7 zero bytes + clientToken u32(4) + writeIndex i32(4)` (`src/MxNativeCodec/NmxWriteMessage.cs:228–238`).
|
||||
|
||||
| WireKind | Type | Value section | Total |
|
||||
|---|---|---|---|
|
||||
| `0x01` | Boolean | 4 bytes literal `[0xff,0xff,0xff,0x00]` (true) or `[0x00,0xff,0xff,0x00]` (false). Bytes 1 and 2 are `0xFF` filler, NOT reserved zeros (`src/MxNativeCodec/NmxWriteMessage.cs:257`). | 37 (`KindOffset(17) + 1 + 4 + 11 + 4`, `src/MxNativeCodec/NmxWriteMessage.cs:121–128`) |
|
||||
| `0x02` | Int32 | 4 bytes LE | 40 |
|
||||
| `0x03` | Float32 | 4 bytes IEEE | 40 |
|
||||
| `0x04` | Float64 | 8 bytes IEEE | 44 |
|
||||
| `0x05` | String | recordLength i32(4) + valueByteLength i32(4) + UTF-16LE bytes(N) + null(2) | **44 + N** (`KindOffset(17) + 1 + 4 + 4 + N + 14 + 4`, `src/MxNativeCodec/NmxWriteMessage.cs:148–157`) |
|
||||
| `0x05` | DateTime | Same shape as String; value is UTF-16LE of `DateTime.ToString("M/d/yyyy h:mm:ss tt", InvariantCulture)` + null (`src/MxNativeCodec/NmxWriteMessage.cs:262,390–393`) | **44 + N** |
|
||||
| `0x41` | BoolArray | 4 unused bytes + count u16 at body[22] + elementWidth u16 at body[24] + elements at body[28] | 18 (prefix) + 10 (array header) + 2N + 14-byte suffix + 4 writeIndex |
|
||||
|
||||
**BoolArray element encoding (writes and reads agree):** each element is a little-endian `i16`, **not** a single byte. `true` → `-1` → `[0xFF, 0xFF]`; `false` → `0` → `[0x00, 0x00]`. Encoder: `BinaryPrimitives.WriteInt16LittleEndian(..., values[i] ? (short)-1 : (short)0)` (`src/MxNativeCodec/NmxWriteMessage.cs:307`). Decoder: `BinaryPrimitives.ReadInt16LittleEndian(...) != 0` (`src/MxNativeCodec/NmxSubscriptionMessage.cs:282, 290`). Element width is therefore 2 bytes (the `2N` in the BoolArray total), and the array-header `elementWidth` field carries `2`.
|
||||
| `0x42` | Int32Array | same shape, 4-byte elements | 18 + 10 + 4N + 14 + 4 |
|
||||
| `0x43` | Float32Array | same | 18 + 10 + 4N + 14 + 4 |
|
||||
| `0x44` | Float64Array | same, 8-byte elements | 18 + 10 + 8N + 14 + 4 |
|
||||
| `0x45` | StringArray / DateTimeArray | per-element length-prefixed records (`src/MxNativeCodec/NmxWriteMessage.cs:346–362`) | 18 + 10 + Σ(per-element) + 14 + 4 |
|
||||
|
||||
**Encoder vs decoder asymmetry for array headers (preserve verbatim):**
|
||||
The encoder writes `count` as **u16 at body[22]** and `elementWidth` as **u16 at body[24]** — both 2-byte little-endian values (`src/MxNativeCodec/NmxWriteMessage.cs:181–182`). The subscription/callback decoder, however, reads `count` as **u16 at body+4** and `elementWidth` as **i32 at body+6** — a 4-byte little-endian read (`src/MxNativeCodec/NmxSubscriptionMessage.cs:264–265`). Because the high u16 of the encoder's `[count, elementWidth]` slot is the small `elementWidth` value and the bytes at offsets 26..27 are zero (no other writes target that slot — `NmxWriteMessage.cs:170–186`), an i32 read at body+6 of a captured write body sees the same numeric value. A Rust port must replicate the asymmetry exactly: write u16/u16, read u16+i32, do not normalize either side.
|
||||
|
||||
Source: `src/MxNativeCodec/NmxWriteMessage.cs:7–394`. Per-type matrices: `analysis/frida/write-body-matrix.tsv`, `write-array-body-matrix.tsv`, `write-mode-matrix.tsv`.
|
||||
|
||||
## Write2 (timestamped)
|
||||
|
||||
Same as `Write` but the 18-byte trailer (14-byte suffix + 4-byte writeIndex) is rewritten by `WriteTimestampedSuffix` (`src/MxNativeCodec/NmxWriteMessage.cs:240–251`). The trailer is **not** lengthened — bytes are repacked, not inserted:
|
||||
|
||||
| Trailer offset | Size | Normal `Write` | `Write2` (timestamped) |
|
||||
|---|---|---|---|
|
||||
| 0 | 2 | `-1 i16` (`NmxWriteMessage.cs:222`) | **`0 i16`** (`NmxWriteMessage.cs:247`) |
|
||||
| 2 | 8 | 8 zero filler bytes (`NmxWriteMessage.cs:223–225`, `WriteNormalSuffix` zero-init) | **8-byte `FILETIME` from `timestamp.ToFileTime()`** (`NmxWriteMessage.cs:248`) |
|
||||
| 10 | 4 | `clientToken u32` (`NmxWriteMessage.cs:225`) | `clientToken u32` (`NmxWriteMessage.cs:249`) |
|
||||
| 14 | 4 | `writeIndex i32` (`NmxWriteMessage.cs:226`) | `writeIndex i32` (`NmxWriteMessage.cs:250`) |
|
||||
|
||||
The FILETIME **replaces** the eight-byte filler that `WriteNormalSuffix` would otherwise leave zero; nothing is inserted between offsets 12 and 19. Total body size is identical to the corresponding non-timestamped `Write`.
|
||||
|
||||
Source: `src/MxNativeCodec/NmxWriteMessage.cs:240–251` (`WriteTimestampedSuffix`); compare to `WriteNormalSuffix` at `src/MxNativeCodec/NmxWriteMessage.cs:215–226`.
|
||||
|
||||
## WriteSecured2 (`0x38`)
|
||||
|
||||
`Write2` body (without trailing clientToken+writeIndex), then:
|
||||
|
||||
```
|
||||
currentUserToken(16) + clientNameLen(i32) + clientNameBytes(UTF-16LE+null)
|
||||
+ verifierUserToken(16) + (-1 i16) + clientToken(u32) + writeIndex(i32)
|
||||
```
|
||||
|
||||
Observed authenticated user token (sample): `07 b9 a9 f4 72 6e ae 48 83 b5 bb de 91 8c 89 0f` (`captures/036-frida-secured*`).
|
||||
|
||||
Source: `src/MxNativeCodec/NmxSecuredWrite2Message.cs:6–105`.
|
||||
|
||||
## SubscriptionStatus (`0x32`)
|
||||
|
||||
```
|
||||
cmd(1=0x32) + version(2=1) + recordCount(i32) + operationId(GUID 16)
|
||||
+ correlationId(GUID 16) + records[recordCount]
|
||||
record: status(i32) + detailStatus(i32) + quality(u16)
|
||||
+ timestamp_filetime(i64) + wireKind(u8) + value(N)
|
||||
```
|
||||
|
||||
**Header length: 39 bytes** = cmd(1) + version(2) + recordCount(4) + operationId(16) + correlationId(16). Records start at byte offset 39 (`src/MxNativeCodec/NmxSubscriptionMessage.cs:98–99`, `for (int i = 0; i < recordCount; i++)` over `offset = 39`).
|
||||
|
||||
## DataUpdate (`0x33`)
|
||||
|
||||
```
|
||||
cmd(1=0x33) + version(2=1) + recordCount(i32) + operationId(GUID 16)
|
||||
+ records[recordCount]
|
||||
record: status(i32) + quality(u16) + timestamp_filetime(i64) + wireKind(u8) + value(N)
|
||||
```
|
||||
|
||||
**Header length: 23 bytes** = cmd(1) + version(2) + recordCount(4) + operationId(16). There is **no** per-message correlationId on `0x33`; the record starts at byte offset 23 (`src/MxNativeCodec/NmxSubscriptionMessage.cs:54–55, 76` — `recordCount` read at offset 3, `operationId` at offset 7, `recordOffset = 23`). The 16-byte difference between the two header lengths (39 − 23) is exactly the `0x32`-only correlationId slot.
|
||||
|
||||
⚠ **Hard invariant: `recordCount == 1` for `0x33` DataUpdate.** The .NET parser throws `ArgumentException` on any other value (`src/MxNativeCodec/NmxSubscriptionMessage.cs:71–74` — `if (recordCount != 1) throw`). The Rust port replicates this as a typed error rather than degrading to opaque-bytes preservation, matching the executable spec. Multi-sample buffered batches are tracked as not-yet-wire-proven in `70-risks-and-open-questions.md` (R2/R13); only single-record DataUpdate frames have been observed in `captures/`.
|
||||
|
||||
Wire kinds **0x01..0x07** (scalars) and **0x41..0x46** (arrays). The set is asymmetric across encode/decode: the write-side encoder collapses both `StringArray` and `DateTimeArray` to `0x45` and never emits `0x46` (`src/MxNativeCodec/NmxWriteMessage.cs:107`); the subscription/callback decoder accepts and demuxes `0x46` as `DateTimeArray` (`src/MxNativeCodec/NmxSubscriptionMessage.cs:173,275`). **Writes use `0x41..0x45` only; reads/callbacks accept `0x41..0x46`.**
|
||||
|
||||
Source: `src/MxNativeCodec/NmxSubscriptionMessage.cs:5–428`.
|
||||
|
||||
## Reference registration (`0x10` / `0x11`)
|
||||
|
||||
Request (`0x10`) — fixed 55-byte header, then variable strings, then 20-byte tail. `HeaderLength = 55` (`src/MxNativeCodec/NmxReferenceRegistrationMessage.cs:15`). The codec only writes at six explicit offsets inside the header (`NmxReferenceRegistrationMessage.cs:80–87`); all other bytes within `[0..55)` are left as zero from the freshly-allocated buffer (`NmxReferenceRegistrationMessage.cs:71–78`) and are preserved verbatim per CLAUDE.md unknown-bytes rule.
|
||||
|
||||
| Offset | Size | Field | Encoding / source |
|
||||
|---|---|---|---|
|
||||
| 0 | 1 | Command | u8 = `0x10` (`NmxReferenceRegistrationMessage.cs:80`) |
|
||||
| 1 | 2 | Version | u16 LE = `1` (`NmxReferenceRegistrationMessage.cs:81`) |
|
||||
| 3 | 4 | ItemHandle | i32 LE (`NmxReferenceRegistrationMessage.cs:82`) |
|
||||
| 7 | 16 | ItemCorrelationId | GUID bytes (`NmxReferenceRegistrationMessage.cs:83`) |
|
||||
| 23 | 2 | `-1 i16` marker | i16 LE = `-1` (`NmxReferenceRegistrationMessage.cs:85`) |
|
||||
| 25 | 2 | Reserved gap | preserved as `0` — never written explicitly; zero-init from `new byte[]` (`NmxReferenceRegistrationMessage.cs:71–78`). Preserved verbatim per CLAUDE.md unknown-bytes rule. |
|
||||
| 27 | 4 | Constant `1 i32` | i32 LE = `1` (`NmxReferenceRegistrationMessage.cs:86`) |
|
||||
| 31 | 24 | Reserved gap | preserved as `0` — never written explicitly; zero-init from `new byte[]` (`NmxReferenceRegistrationMessage.cs:71–78`). The `Parse` method reads these bytes implicitly only via `body.Slice(offset, ItemStringReservedLength)` further on; the prefix range itself is round-tripped untouched. Preserved verbatim per CLAUDE.md unknown-bytes rule. |
|
||||
|
||||
After the 55-byte prefix:
|
||||
|
||||
```
|
||||
itemDefinition(taggedString: i32 length with high byte 0x81 + UTF-16LE + null)
|
||||
itemStringReserved(8 bytes; Parse asserts all-zero, NmxReferenceRegistrationMessage.cs:42–47)
|
||||
itemContext(untaggedString: i32 length + UTF-16LE + null)
|
||||
tail(20 bytes; first 19 zero, tail[19] = subscribe_flag, NmxReferenceRegistrationMessage.cs:54–56,92)
|
||||
```
|
||||
|
||||
Result (`0x11`):
|
||||
```
|
||||
cmd(1) + version(2) + itemHandle(i32) + correlation(GUID 16)
|
||||
+ firstTimestamp(i64) + secondTimestamp(i64)
|
||||
+ statusCategory(u8) + statusDetail(u8)
|
||||
+ blockLength(i32)
|
||||
+ itemDefinition(tagged) + mxDataType(i32) + reserved(6)
|
||||
+ itemContext(untagged)
|
||||
+ tail(16 zero)
|
||||
```
|
||||
|
||||
**Tail = 16 zero bytes is a hard parser invariant.** `NmxReferenceRegistrationResultMessage.Parse` asserts both that the trailing slice is exactly `TailLength = 16` (`src/MxNativeCodec/NmxReferenceRegistrationResultMessage.cs:21, 59–62`) and that every byte in it is `0` — `body.Slice(offset, TailLength).IndexOfAnyExcept((byte)0) >= 0` throws `ArgumentException` (`NmxReferenceRegistrationResultMessage.cs:64–67`). Wire-confirmed against the `0x11` registration-result frames captured under `captures/080-frida-buffered-external-write-testint/` (per `docs/Capture-Run-2026-04-25.md:480–486`, which records that this capture supplied the stable normal-and-buffered `0x10` registration bodies *plus the matching `0x11` registration-result frames* used by the `NmxReferenceRegistrationResultMessage` tests — the same parser whose all-zero-tail assertion would have rejected those captures had the wire bytes been non-zero). Per CLAUDE.md's preserve-unknown-bytes rule the Rust port still carries them as 16 explicit bytes (not skipped) so non-zero tails surface as a typed parse error rather than silent acceptance.
|
||||
|
||||
Tagged-string encoding: `4-byte length: tagged ? (byteLength | 0x81000000) : byteLength` followed by UTF-16LE bytes + null terminator.
|
||||
|
||||
Source: `src/MxNativeCodec/NmxReferenceRegistrationMessage.cs:6–142`, `NmxReferenceRegistrationResultMessage.cs:6–120`.
|
||||
|
||||
## ASB Variant
|
||||
|
||||
| Offset | Size | Field |
|
||||
|---|---|---|
|
||||
| 0 | 2 | Type ID (u16 LE, AsbDataType) |
|
||||
| 2 | 4 | Length (i32 LE, logical) |
|
||||
| 6 | 4 | PayloadLength (i32 LE, byte count) |
|
||||
| 10 | N | Payload |
|
||||
|
||||
AsbDataType IDs (live-proven): Bool=17, Int32=4, Float=8, Double=9, String=10 (UTF-16LE), DateTime=11 (FILETIME), Duration=12 (.NET Ticks), Int32Array=44, FloatArray=48, DoubleArray=49, StringArray=50, DateTimeArray=51, DurationArray=52, BoolArray=57.
|
||||
|
||||
Source: `src/MxAsbClient/AsbContracts.cs:1169–1293`, `docs/ASB-Variant-Wire-Format.md`.
|
||||
|
||||
## ASB AsbStatus
|
||||
|
||||
| Offset | Size | Field |
|
||||
|---|---|---|
|
||||
| 0 | 1 | Count (sbyte: -1 marker-only, 0..N elements) |
|
||||
| 1 | 4 | PayloadLength (u32 LE) |
|
||||
| 5 | N | Payload (packed status elements) |
|
||||
|
||||
Status element: marker byte (high bit clear = value follows), low 7 bits = type ID. Known IDs: 5=MxStatusCategory, 6=MxStatusDetail, 7=MxQuality. If value present, 2-byte u16 LE follows.
|
||||
|
||||
Quality bits (mask `0x00C0`): `0xC0`=Good, `0x40`=Uncertain, `0x00`=Bad.
|
||||
|
||||
Source: `src/MxAsbClient/AsbContracts.cs:1106–1167`, `src/MxAsbClient/AsbPublishedValue.cs:87–119`.
|
||||
|
||||
## Authentication
|
||||
|
||||
### NTLMv2 (NMX/DCE-RPC)
|
||||
|
||||
- Negotiate flags: Unicode | RequestTarget | Sign | Seal | ExtendedSessionSecurity | Negotiate128 | KeyExchange
|
||||
- NTLMv2 NT-OWF = `HMAC-MD5(MD4(unicode(password)), unicode(uppercase(user) + domain))`
|
||||
- Type3 with AV pairs from server's TargetInfo (channel binding optional)
|
||||
- Packet integrity: `HMAC-MD5(SignKey, sequence || plaintext)` → first 8 bytes XOR with RC4 keystream
|
||||
- Sign-key / seal-key: MD5 over magic constants
|
||||
|
||||
Source: `src/MxNativeClient/ManagedNtlmClientContext.cs:1–389`. Reference: [MS-NLMP].
|
||||
|
||||
### ASB application-level
|
||||
|
||||
1. DH key exchange (prime, generator, key size in `HKLM\ArchestrA\ArchestrAServices\{Solution}`).
|
||||
2. Shared secret = `DH(remote_pub, local_priv, prime)`.
|
||||
3. AES key = `SHA1(shared_secret || passphrase)`.
|
||||
4. Per-message HMAC (algorithm in registry: MD5/SHA1/SHA512).
|
||||
5. AES-128 message encryption with `PBKDF2(passphrase, salt, 1000 iters, SHA1)`.
|
||||
|
||||
Passphrase obtained via DPAPI:
|
||||
- `HKLM\[Wow6432Node\]ArchestrA\ArchestrAServices\{SolutionName}\sharedsecret`
|
||||
- `ProtectedData.Unprotect(bytes, Unicode("wonderware"), LocalMachine)`
|
||||
|
||||
Source: `src/MxAsbClient/AsbSystemAuthenticator.cs:8–167`, `AsbRegistry.cs:8–67`.
|
||||
|
||||
## Galaxy SQL schema (subset)
|
||||
|
||||
| Table | Columns of interest |
|
||||
|---|---|
|
||||
| `dbo.gobject` | tag_name, gobject_id, package_id |
|
||||
| `dbo.instance` | mx_platform_id, mx_object_id, mx_engine_id |
|
||||
| `dbo.dynamic_attribute` | mx_data_type, is_array |
|
||||
| `dbo.package` | inheritance chain (recursive CTE) |
|
||||
| `dbo.user_profile` | user_guid, user_profile_name, intouch_access_level, roles |
|
||||
|
||||
`GalaxyRepositoryTagResolver.cs:209–293` is the canonical query (recursive CTE for `deployed_package_chain` → `ranked_dynamic` → `primitive_attributes`).
|
||||
|
||||
User-role parsing: hex-encoded UTF-16LE blob, scan for null-terminated wide-char strings (`GalaxyRepositoryUserResolver.cs:87–133`).
|
||||
|
||||
## HRESULT / status codes (observed)
|
||||
|
||||
| HRESULT | Meaning | Source |
|
||||
|---|---|---|
|
||||
| `0x00000000` | S_OK | trivially |
|
||||
| `0x00000001` | S_FALSE (pending/retry) | observed in callbacks |
|
||||
| `0x80004021` | Returned by `MxNativeSession.WriteSecuredAsync` (the .NET native reimplementation) before reaching the wire — `src/MxNativeClient/MxNativeSession.cs:218-221`. **NOT** a real LMX-proxy constraint: `wwtools/mxaccesscli/` verifies the production LMX `WriteSecured` always takes two user ids `(currentUserId, verifierUserId, value)` and accepts single-user secured writes as `currentUserId == verifierUserId`. See R6 in `70-risks-and-open-questions.md`. | `docs/DotNet10-Native-Library-Plan.md`; `wwtools/mxaccesscli/docs/api-notes.md:60-72` |
|
||||
| `0x80070057` | E_INVALIDARG | observed for stale handles |
|
||||
| `0x8007139F` | `ERROR_INVALID_STATE` — observed from `IDataConsumer.ProcessActivateSuspend2` while the namespace is not yet activated. Despite the canonical Win32 name, the codebase elsewhere has labelled this "uninitialized object" / `EngineNotRegistered`; the canonical Win32 mapping is `ERROR_INVALID_STATE` per `docs/Capture-Run-2026-04-25.md:888` and `docs/MXAccess-Public-API.md:326`. | `docs/Capture-Run-2026-04-25.md:872,886,888`; `docs/MXAccess-Public-API.md:313,326`; `docs/Current-Sprint-State.md:119,122` |
|
||||
| `0x80040154` | REGDB_E_CLASSNOTREG | callback proxy/stub missing |
|
||||
| `0x8001011D` | ORPC callback OBJREF rejected (security binding) | observed in callback flows |
|
||||
| `0x800706BA` | RPC server unavailable | NmxSvc not running |
|
||||
|
||||
## Status detail codes (subset)
|
||||
|
||||
From `MxStatusSource.RespondingNmx`-side callbacks:
|
||||
- 16 = timeout
|
||||
- 17 = platform-comm failure
|
||||
- 18 = invalid platform id
|
||||
- 21 = invalid reference
|
||||
- 22 = no Galaxy Repository
|
||||
- 23 = invalid object id
|
||||
- 30 = type mismatch
|
||||
- 31 = not readable
|
||||
- 32 = not writeable
|
||||
- 33 = access denied
|
||||
- 56 = secured-write related
|
||||
- 57 = verified-write related
|
||||
|
||||
⚠ **Three different on-wire widths carry "detail"-shaped numbers; do not conflate them.**
|
||||
|
||||
| Field | Width | Signedness | Where it lives | Source |
|
||||
|---|---|---|---|---|
|
||||
| `MxStatus.Detail` | 2 bytes | **`i16`** (signed `short`) | `MxStatus` record, the canonical promoted-status type | `src/MxNativeCodec/MxStatus.cs:32` (`short Detail`) |
|
||||
| DataUpdate / SubscriptionStatus record `quality` | 2 bytes | **`u16`** (unsigned, bitmask, mask `0x00C0`) | per-record header in `0x32`/`0x33` frames | `src/MxNativeCodec/NmxSubscriptionMessage.cs:136` |
|
||||
| Record `status` and (SubscriptionStatus only) `detailStatus` | 4 bytes each | **`i32`** (signed) | per-record header in `0x32`/`0x33` frames | `src/MxNativeCodec/NmxSubscriptionMessage.cs:126` (`status`), `:132` (`detailStatus`) |
|
||||
|
||||
When promoting a record to an `MxStatus`, the i32 `detailStatus` is **narrowed** to `i16` — sign-extension applies on the way out, but values outside `i16` range are not representable. The Rust port must preserve the on-wire width on each layer (don't widen everything to `i32` on parse) so that an out-of-range `detailStatus` is a typed error rather than a silent truncation.
|
||||
|
||||
Source: `docs/MXAccess-Public-API.md`, the MxStatus parsing in `src/MxNativeCodec/MxStatus.cs:3–126`, and the per-record decoder in `src/MxNativeCodec/NmxSubscriptionMessage.cs:117–149`.
|
||||
|
||||
## Completion-only frames
|
||||
|
||||
5-byte completion frame `00 00 50 80 00` → `MxStatus.WriteCompleteOk` (the only proven mapping).
|
||||
|
||||
1-byte completion frames (`0x00`, `0x41`, `0xEF`) are preserved as raw, unpromoted statuses (`work_remain.md:164–174`). Do not synthesise typed completion events from them.
|
||||
|
||||
## Items not yet wire-proven
|
||||
|
||||
- Multi-sample buffered batches (`recordCount > 1` in `0x33` frames). Provider does not currently emit them.
|
||||
- Generic `OperationComplete` events outside the proven 5-byte completion frame.
|
||||
- Activate/Suspend transition events.
|
||||
- ASB write timestamp + status fields in publish responses.
|
||||
|
||||
Listed with mitigation strategy in `70-risks-and-open-questions.md`. The codec must accept these as opaque bytes and not promote them.
|
||||
@@ -0,0 +1,304 @@
|
||||
# Error model
|
||||
|
||||
## Goals
|
||||
|
||||
- No `HRESULT` bare integers in the public API.
|
||||
- Every operation returns `Result<T, mxaccess::Error>`.
|
||||
- `Error` is `#[non_exhaustive]` so adding variants is non-breaking.
|
||||
- Every error variant carries enough context to debug without re-running.
|
||||
- No panics in the public API.
|
||||
|
||||
## Public `mxaccess::Error`
|
||||
|
||||
```rust
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
#[non_exhaustive]
|
||||
pub enum Error {
|
||||
#[error("connection: {0}")]
|
||||
Connection(#[from] ConnectionError),
|
||||
|
||||
#[error("authentication: {0}")]
|
||||
Auth(#[from] AuthError),
|
||||
|
||||
#[error("protocol: {0}")]
|
||||
Protocol(#[from] ProtocolError),
|
||||
|
||||
#[error("configuration: {0}")]
|
||||
Configuration(#[from] ConfigError),
|
||||
|
||||
#[error("type mismatch on {reference}: expected {expected:?}, got {actual:?}")]
|
||||
TypeMismatch { reference: Arc<str>, expected: MxValueKind, actual: MxValueKind },
|
||||
// The `{expected:?} {actual:?}` formatting above requires `MxValueKind: Debug`.
|
||||
// The codec layer guarantees this: `MxValueKind`, `MxValue`, `MxStatus`,
|
||||
// `MxStatusCategory`, and `MxStatusSource` all `#[derive(Debug)]` (this is a
|
||||
// public-API requirement of the codec crate, not just a convenience). The
|
||||
// .NET reference enum at src/MxNativeCodec/MxValueKind.cs:3-18 is the spec
|
||||
// for the variants; the Rust port reproduces them and pins `Debug` as part
|
||||
// of the contract so `Error::Display` and `tracing` fields render usefully.
|
||||
|
||||
#[error("security: {0}")]
|
||||
Security(#[from] SecurityError),
|
||||
|
||||
#[error("unsupported on {transport:?} transport: {operation}")]
|
||||
Unsupported { operation: Cow<'static, str>, transport: TransportKind },
|
||||
|
||||
#[error("operation timed out after {0:?}")]
|
||||
Timeout(Duration),
|
||||
|
||||
#[error("operation cancelled")]
|
||||
Cancelled,
|
||||
|
||||
#[error("status: success={success} category={category:?} source={source:?} detail={detail}")]
|
||||
Status {
|
||||
/// `MxStatus.Success` (`short`). `-1` is the documented OK sentinel
|
||||
/// (src/MxNativeCodec/MxStatus.cs:29,36-58). Carried verbatim for
|
||||
/// byte-parity diagnostics; consumers usually inspect `category` first.
|
||||
success: i16,
|
||||
category: MxStatusCategory,
|
||||
source: MxStatusSource,
|
||||
detail: i16,
|
||||
},
|
||||
|
||||
#[error("io: {0}")]
|
||||
Io(#[from] std::io::Error),
|
||||
}
|
||||
```
|
||||
|
||||
Sub-errors (`ConnectionError`, `AuthError`, `ProtocolError`, `ConfigError`, `SecurityError`) are similar `thiserror`-derived `#[non_exhaustive]` enums. Sources preserved via `#[source]`.
|
||||
|
||||
`MxStatusCategory` and `MxStatusSource` (re-exported from `mxaccess-codec`) are likewise `#[non_exhaustive]` — AVEVA may legally introduce new short-valued category/source variants, and the .NET reference already has `Unknown=-1` open-set sentinels (src/MxNativeCodec/MxStatus.cs:3,17). Without `#[non_exhaustive]`, downstream `match` statements would freeze the API on the first new category. The Rust port mirrors the **seven** `MxStatusSource` values one-for-one as named in the .NET enum — `Unknown=-1, RequestingLmx=0, RespondingLmx=1, RequestingNmx=2, RespondingNmx=3, RequestingAutomationObject=4, RespondingAutomationObject=5` (src/MxNativeCodec/MxStatus.cs:17-26). `DetectedBy` is essential for diagnostics: it tells the consumer which layer (requesting LMX vs responding NMX vs the automation object itself) detected the fault. Verified against wwtools/mxaccesscli/docs/api-notes.md:107-136 — every `MxStatusInfo[]` entry exposes `DetectedBy`.
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
|
||||
#[non_exhaustive]
|
||||
#[repr(i16)]
|
||||
pub enum MxStatusCategory {
|
||||
Unknown = -1,
|
||||
Ok = 0,
|
||||
Pending = 1,
|
||||
Warning = 2,
|
||||
CommunicationError = 3,
|
||||
ConfigurationError = 4,
|
||||
OperationalError = 5,
|
||||
SecurityError = 6,
|
||||
SoftwareError = 7,
|
||||
OtherError = 8,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
|
||||
#[non_exhaustive]
|
||||
#[repr(i16)]
|
||||
pub enum MxStatusSource {
|
||||
Unknown = -1,
|
||||
RequestingLmx = 0,
|
||||
RespondingLmx = 1,
|
||||
RequestingNmx = 2,
|
||||
RespondingNmx = 3,
|
||||
RequestingAutomationObject = 4,
|
||||
RespondingAutomationObject = 5,
|
||||
}
|
||||
```
|
||||
|
||||
```rust
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
#[non_exhaustive]
|
||||
pub enum ConnectionError {
|
||||
#[error("tcp connect to {addr} failed")]
|
||||
TcpConnect { addr: SocketAddr, #[source] source: std::io::Error },
|
||||
#[error("OXID resolve failed")]
|
||||
OxidResolve { #[source] source: RpcError },
|
||||
// `RpcError` is a public, stable, `std::error::Error`-implementing type
|
||||
// defined in the `mxaccess-rpc` crate (per the topology in 30-crate-topology.md)
|
||||
// and re-exported here. Its declaration is:
|
||||
//
|
||||
// #[derive(Debug, thiserror::Error)]
|
||||
// #[non_exhaustive]
|
||||
// pub enum RpcError { /* DCE/RPC framing/PDU/NDR variants */ }
|
||||
//
|
||||
// Required because `#[source]` on the `OxidResolve` variant above demands
|
||||
// `std::error::Error + 'static`. The variant set is documented in the raw
|
||||
// layer; this crate guarantees only that `RpcError: std::error::Error +
|
||||
// Send + Sync + Debug + Display + 'static` and is `#[non_exhaustive]` so
|
||||
// new RPC failure modes can be added without a breaking change.
|
||||
#[error("RPC server unavailable")]
|
||||
ServerUnavailable,
|
||||
#[error("callback proxy/stub not registered (REGDB_E_CLASSNOTREG)")]
|
||||
CallbackProxyMissing,
|
||||
#[error("engine not registered (UninitializedObject)")]
|
||||
EngineNotRegistered,
|
||||
}
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
#[non_exhaustive]
|
||||
pub enum AuthError {
|
||||
#[error("NTLM rejected: {reason}")]
|
||||
Ntlm { reason: String },
|
||||
#[error("ASB DH handshake failed: {reason}")]
|
||||
AsbHandshake { reason: String },
|
||||
#[error("DPAPI shared secret unavailable: {reason}")]
|
||||
SharedSecret { reason: String },
|
||||
}
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
#[non_exhaustive]
|
||||
pub enum ProtocolError {
|
||||
#[error("decode at offset {offset} ({reason}); buffer len {buffer_len}")]
|
||||
Decode { offset: usize, reason: &'static str, buffer_len: usize },
|
||||
#[error("inner length {declared} does not match body length {actual}")]
|
||||
InnerLengthMismatch { declared: i32, actual: usize },
|
||||
#[error("unexpected opcode {0:#x}")]
|
||||
UnexpectedOpcode(u8),
|
||||
#[error("envelope mismatch: {reason}")]
|
||||
EnvelopeMismatch { reason: &'static str },
|
||||
}
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
#[non_exhaustive]
|
||||
pub enum ConfigError {
|
||||
/// Covers both ill-formed arguments and stale/unknown item handles.
|
||||
/// The proven x86 stack returns `0x80070057 E_INVALIDARG` for stale
|
||||
/// handles (captures/008-write-test-int-same-value/harness.log:7,
|
||||
/// analysis/ghidra/exports/LmxProxy.dll.item-helper-decompile.md:60,75,88,164).
|
||||
/// Discriminate via `detail` if a finer split is justified by future evidence.
|
||||
#[error("invalid argument: {detail}")]
|
||||
InvalidArgument { detail: String },
|
||||
#[error("galaxy resolver: {reason}")]
|
||||
Galaxy { reason: String },
|
||||
}
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
#[non_exhaustive]
|
||||
pub enum SecurityError {
|
||||
#[error("callback OBJREF rejected (HRESULT 0x8001011D)")]
|
||||
CallbackObjRefRejected,
|
||||
#[error("verifier user token required for secured write")]
|
||||
VerifierRequired,
|
||||
}
|
||||
```
|
||||
|
||||
## Mapping rules
|
||||
|
||||
The async layer translates raw protocol outcomes into typed errors at exactly one place: at the boundary where a `Result<_, RawError>` from `mxaccess-nmx` or `mxaccess-asb` becomes a `Result<_, Error>` exposed to the consumer.
|
||||
|
||||
| Source | Maps to |
|
||||
|---|---|
|
||||
| TCP connect failure | `Error::Connection(ConnectionError::TcpConnect { addr, source })` |
|
||||
| OXID resolve failure | `Error::Connection(ConnectionError::OxidResolve { source })` |
|
||||
| NTLM Type1/Type3 reject | `Error::Auth(AuthError::Ntlm { reason })` |
|
||||
| DH handshake mismatch (ASB) | `Error::Auth(AuthError::AsbHandshake { reason })` |
|
||||
| HRESULT `0x80070057` (incl. stale/unknown item handles) | `Error::Configuration(ConfigError::InvalidArgument { detail })` |
|
||||
| HRESULT `0x80040154` | `Error::Connection(ConnectionError::CallbackProxyMissing)` |
|
||||
| HRESULT `0x8007139F` | `Error::Connection(ConnectionError::EngineNotRegistered)` |
|
||||
| HRESULT `0x8001011D` | `Error::Security(SecurityError::CallbackObjRefRejected)` (citation: `docs/NMX-COM-Contracts.md:590-594`, the full surrounding paragraph; no capture currently logs this HRESULT, so promotion to a typed error is based on probe analysis rather than a recorded wire fixture — re-examine if a future capture contradicts the narrative) |
|
||||
| HRESULT `0x800706BA` | `Error::Connection(ConnectionError::ServerUnavailable)` |
|
||||
| HRESULT `0x80004021` from a Write operation | preserved verbatim as `Error::Status { ..raw HRESULT carried }` — no synthesized typed error. Note: this HRESULT was previously mapped to a synthesized "single-token `WriteSecured` not supported" error; that mapping has been **removed** per `wwtools/mxaccesscli/` verification. Real MxAccess `WriteSecured` always takes two user ids (`(currentUserId, verifierUserId, value)`) and the LMX proxy CLI exposes single-user secured writes as `currentUserId == verifierUserId`. The 0x80004021 in `MxNativeSession.WriteSecuredAsync` (src/MxNativeClient/MxNativeSession.cs:218-221) is a defect of that .NET reimplementation, not a real LMX constraint — see R6 in `70-risks-and-open-questions.md` |
|
||||
| Decoder failure on inbound frame | `Error::Protocol(ProtocolError::Decode { ... })` |
|
||||
| `MxStatus` with `category != Ok` from `read`/`write`/`subscribe` | `Error::Status { success, category, source, detail }` (the full 4-tuple from the .NET `MxStatus` record at src/MxNativeCodec/MxStatus.cs:28-33 — `success` is the documented OK sentinel `-1` and is carried for byte-parity diagnostics per CLAUDE.md "preserve unknown bytes") |
|
||||
| `MxStatus` with `category != Ok` on a `DataChange` from a `Subscription` | **not** an error — set on `DataChange.status`, surfaced through the stream |
|
||||
|
||||
The split between operation-status-as-error and subscription-status-as-data is the only sensible mapping. A subscription frame with `category=Warning, detail=stale` is data the caller wants to receive and inspect; it is not an exception. An operation that returns `category=ConfigurationError, detail=21 (invalid reference)` is a failure to produce a value and should be surfaced as `Err`.
|
||||
|
||||
## Cancellation semantics
|
||||
|
||||
- **Drop**: dropping a future cancels it. The library does not `block_on` inside drop.
|
||||
- **Subscription drop**: holds a `tokio::sync::oneshot::Sender<UnAdviseRequest>` to signal `UnAdvise` from the task that owns the connection. On drop, the sender is consumed; the connection task fires `UnAdvise` best-effort. The drop returns immediately; no async cleanup is awaited. If the best-effort `UnAdvise` fails (e.g. connection already gone), the failure is logged via `tracing::warn!` with the subscription handle and the underlying error. The drop completes regardless. (Log signature precedent: `captures/probe-add-remove.log:7`, which records the .NET `LMX_UnAdvise` per-handle warning emitted when the proxy reports a non-Ok status during teardown.)
|
||||
- **Session shutdown**: `Session::shutdown(timeout)` returns `Ok(())` once `UnregisterEngine` completes or the timeout elapses. Drop without `shutdown` runs an unbounded best-effort cleanup spawn — fine for tests, suboptimal for production where `shutdown` is the recommendation.
|
||||
- **`CancellationToken`**: long operations (`subscribe_buffered`, recovery, `connect`) accept an optional token. Cancellation surfaces as `Error::Cancelled`.
|
||||
- **Timeout**: `tokio::time::timeout` works on every operation because `async fn`s are cancel-correct by construction.
|
||||
|
||||
## Panic policy
|
||||
|
||||
Public API panics only for invariants the type system enforces (e.g. an array length that cannot be wrong). In practice this means: **no `unwrap()`, `expect()`, or `panic!()` in non-test code paths**.
|
||||
|
||||
Lints:
|
||||
- `clippy::unwrap_used = "deny"`
|
||||
- `clippy::expect_used = "deny"`
|
||||
- `clippy::panic = "deny"`
|
||||
- `clippy::indexing_slicing = "deny"` — out-of-bounds `slice[i]` is a panic; codecs use `slice.get(i).ok_or(...)?` instead.
|
||||
- `clippy::unreachable = "deny"` — `unreachable!()` is still a panic; force `Err(ProtocolError::...)` on supposedly-impossible decoder branches so the wire is never trusted to match the type system.
|
||||
- `clippy::todo = "deny"` — `todo!()` must not ship in non-test code.
|
||||
- `clippy::arithmetic_side_effects = "warn"` — overflow-on-debug is a real footgun, but the codec does legitimate `u8`/`u16` field arithmetic where the lint is noisy; `warn` keeps it visible without blocking honest cases. Use `wrapping_*`/`checked_*` explicitly when the intent is non-default.
|
||||
|
||||
Test modules opt out of the unwrap/expect/indexing rules via the actual pattern:
|
||||
|
||||
```rust
|
||||
#![cfg_attr(test, allow(clippy::unwrap_used, clippy::expect_used, clippy::indexing_slicing))]
|
||||
```
|
||||
|
||||
placed at the top of the test module (or `#[cfg(test)] mod tests { #![allow(...)] ... }` for inline test modules). CI rejects PRs that introduce any of the denied lints in non-test paths.
|
||||
|
||||
## Diagnostics surface
|
||||
|
||||
`Error::Display` is short and operator-friendly. `Error::source()` walks the cause chain. `Error::Debug` is verbose for logs.
|
||||
|
||||
```rust
|
||||
match session.write("X.Y", v).await {
|
||||
Ok(()) => {}
|
||||
Err(e) => {
|
||||
tracing::error!(error = ?e, error.source = ?e.source(), "write failed");
|
||||
return Err(e);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For protocol-level decode failures, every `ProtocolError::Decode` carries:
|
||||
- byte offset in the frame
|
||||
- reason (`&'static str` — interned message)
|
||||
- buffer length
|
||||
|
||||
This matches the .NET reference's tendency to throw `ArgumentException` with byte-offset messages.
|
||||
|
||||
## Recovery decisions
|
||||
|
||||
`Error::is_retryable() -> bool` informs the default `RecoveryPolicy`:
|
||||
|
||||
| Variant | Retryable? |
|
||||
|---|---|
|
||||
| `Connection(ServerUnavailable)` | no — `0x800706BA` in the proven stack is a structural callback-marshalling failure (no SYN to the advertised callback port after security bindings are added), not a flapping `NmxSvc.exe`. Retrying loops forever. Evidence: `analysis/proxy/managed-registerengine2-callback-probe.txt:8`, `docs/NMX-COM-Contracts.md:592`. |
|
||||
| `Connection(TcpConnect)` | depends on the inner `io::Error.kind()` — same dispatch as `Io(_)` below: `WouldBlock`/`Interrupted`/`TimedOut`/`ConnectionReset` → yes; `AddrNotAvailable`/`HostUnreachable`/`ConnectionRefused` → no. Implementation: `is_retryable()`'s `Connection(TcpConnect { source, .. })` arm delegates to `source.kind()` rather than blanket-yes; otherwise a name-not-found hot-loops. |
|
||||
| `Connection(OxidResolve)` | yes |
|
||||
| `Connection(CallbackProxyMissing)` | no (config issue) |
|
||||
| `Connection(EngineNotRegistered)` | yes (re-register) |
|
||||
| `Auth(Ntlm)` | no (credentials are wrong) |
|
||||
| `Auth(AsbHandshake)` | no |
|
||||
| `Auth(SharedSecret)` | no |
|
||||
| `Configuration(*)` | no |
|
||||
| `TypeMismatch` | no |
|
||||
| `Security(*)` | no |
|
||||
| `Status` with `category=CommunicationError` | yes |
|
||||
| `Status` with `category=Pending` | yes (with backoff) |
|
||||
| `Status` with `category=ConfigurationError` | depends on `detail`: `detail == 21` ("Invalid reference", src/MxNativeCodec/MxStatus.cs:76) → yes, after a Galaxy-resolver-cache refresh — the .NET reference's per-operation `ResolveTagAsync` (src/MxNativeClient/MxNativeSession.cs:173,196,232,255) makes a stale-cache miss recoverable on the next attempt. All other `ConfigurationError` details → no. Implementation: `is_retryable()` matches the inner `detail` and returns `true` only for the resolver-refresh-eligible codes. |
|
||||
| `Status` other categories | no by default; consumer can override |
|
||||
| `Timeout` | yes |
|
||||
| `Cancelled` | no |
|
||||
| `Protocol(*)` | no — protocol-level decode bugs are not retryable |
|
||||
| `Io(_)` | depends on `kind()`: `WouldBlock`/`Interrupted`/`TimedOut` → yes; everything else → no |
|
||||
|
||||
The default `RecoveryPolicy::exponential` calls `is_retryable()` before re-attempting and emits `RecoveryEvent::Failed { error, .. }` if not.
|
||||
|
||||
## Status as data on subscriptions
|
||||
|
||||
```rust
|
||||
pub struct DataChange {
|
||||
pub reference: Arc<str>,
|
||||
pub value: MxValue,
|
||||
pub quality: u16,
|
||||
pub timestamp: SystemTime,
|
||||
pub status: MxStatus,
|
||||
}
|
||||
```
|
||||
|
||||
Consumers inspect `status.category` to distinguish good/uncertain/bad data. The default `mxaccess` does not filter; if a consumer wants Ok-only data, they filter the stream themselves:
|
||||
|
||||
```rust
|
||||
let mut sub = session.subscribe("X.Y").await?
|
||||
.filter_map(|change| async move {
|
||||
match change {
|
||||
Ok(c) if c.status.category == MxStatusCategory::Ok => Some(Ok(c)),
|
||||
Ok(_) => None, // drop non-Ok
|
||||
Err(e) => Some(Err(e)),
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
This keeps `Error` reserved for actual failures and keeps stream items composable.
|
||||
@@ -0,0 +1,186 @@
|
||||
# Roadmap
|
||||
|
||||
The Rust port is staged so each milestone is independently usable: a milestone delivers either a useful crate, a measurable test improvement, or a concrete API surface. Earlier milestones never depend on later ones.
|
||||
|
||||
## Phasing
|
||||
|
||||
### M0 — Workspace skeleton
|
||||
|
||||
- Create `rust/` workspace per `30-crate-topology.md`.
|
||||
- Pin Rust toolchain via `rust-toolchain.toml`.
|
||||
- CI: `cargo build --workspace`, `cargo test --workspace`, `cargo clippy --workspace -- -D warnings`, `cargo fmt --check`.
|
||||
- Test infrastructure: `tests/fixtures/` populated from `captures/` (copy — junctions are Windows-only and don't survive `git clone` on Linux/macOS; symlinks need Developer Mode on Windows; a plain copy is the only cross-platform option). The matching line in `30-crate-topology.md:29` says the same thing — flag any drift to the user.
|
||||
- `mxaccess-codec` exposes the type stubs (empty bodies returning `unimplemented!`) so downstream crates compile.
|
||||
- **Define the `Transport` trait (and a placeholder `Session` shape it returns) in M0** as empty/stub signatures in `mxaccess` so M5 can build against the trait without waiting for M4's NMX implementation. This is what allows M5 to run in parallel with M3/M4 — see "Sequencing dependencies" below. M4 fills in the concrete `NmxTransport` impl + recovery policy + `Stream<Item = DataChange>` plumbing; M5 fills in `AsbTransport` against the same trait.
|
||||
- Update `CLAUDE.md` "Common commands" with cargo invocations.
|
||||
|
||||
**Definition of done:** `cargo build --workspace` succeeds; CI green on a clean commit; the empty crates publish-check (`cargo publish --dry-run -p mxaccess-codec`) passes. **Dependency on Q2 (license).** `cargo publish --dry-run` requires a resolved `license` field in `Cargo.toml`; until Q2 in `70-risks-and-open-questions.md` is settled, the publish-check is downgraded to `cargo build --release -p mxaccess-codec`. M0 cannot complete cleanly with the publish-check until Q2 is resolved.
|
||||
|
||||
### M1 — Codec parity
|
||||
|
||||
Implement every codec type from `src/MxNativeCodec/`:
|
||||
|
||||
- `MxReferenceHandle` (CRC-16/IBM, 20-byte layout)
|
||||
- `NmxTransferEnvelope` + `NmxTransferEnvelopeTemplate`
|
||||
- `NmxItemControlMessage` (advise / supervisory / unadvise)
|
||||
- `NmxWriteMessage` (scalar + array, normal + timestamped)
|
||||
- `NmxSecuredWrite2Message`
|
||||
- `NmxSubscriptionMessage` (DataUpdate, SubscriptionStatus)
|
||||
- `NmxReferenceRegistrationMessage` + Result
|
||||
- `NmxMetadataQueryMessage`
|
||||
- `NmxOperationStatusMessage`
|
||||
- `ObservedWriteBodyTemplate`
|
||||
- ASB Variant + AsbStatus + RuntimeValue
|
||||
- `MxStatus`, `MxValueKind`, `MxDataType`, `MxValue`
|
||||
|
||||
**Definition of done:** every Frida-captured write/advise/subscribe body that the .NET reference encodes today round-trips byte-identical through `mxaccess-codec` — i.e. the proven matrix in `work_remain.md` (scalar/array writes, advise/unadvise, single-record `0x33` DataUpdate, single-record SubscriptionStatus, the 5-byte `00 00 50 80 00` write-complete frame, and the 1-byte completion frames `0x00`/`0x41`/`0xEF` preserved verbatim). Cross-validated against `src/MxNativeCodec.Tests/` outputs (a fixture runner shells out to `dotnet run --project src\MxNativeCodec.Tests` and asserts the same bytes are produced for shared inputs).
|
||||
|
||||
Captures whose native behaviour the .NET reference does not yet decode are explicitly out of scope for M1 and are tracked elsewhere:
|
||||
|
||||
- `captures/077`, `captures/079-082`, `captures/094` — buffered batch payloads (`work_remain.md:176–181`); deferred to M6 + R2.
|
||||
- `captures/036` — Activate/Suspend trigger conditions; deferred to R5.
|
||||
- Single-token `WriteSecured` (returns `0x80004021` before sending the body); deferred to R6.
|
||||
|
||||
These captures are still loaded as fixtures so their headers/envelopes round-trip, but the inner unproven payloads are preserved as opaque bytes rather than asserted against a typed decode.
|
||||
|
||||
### M2 — DCE/RPC + NTLM + OBJREF + OXID + callback exporter
|
||||
|
||||
- `mxaccess-rpc`: NTLMv2 client context, DCE/RPC PDU codec, TCP transport, OBJREF parser, OXID resolution, IRemUnknown::RemQueryInterface.
|
||||
- `mxaccess-callback`: callback exporter (RPC server with `INmxSvcCallback` + `IRemUnknown`).
|
||||
- Live probe: connect to local `NmxSvc.exe`, execute `RegisterEngine2` + `GetPartnerVersion` round-trip, register a callback OBJREF, observe a status frame.
|
||||
|
||||
**Definition of done:** all three of the following must hold against a running AVEVA install (the `partnerVersion`-only check is insufficient on its own — it does not exercise the callback exporter, which the .NET evidence shows is the hardest part of M2 since `MxNativeClient` had to hand-roll `INmxSvcCallback`/`IRemUnknown`):
|
||||
|
||||
1. `cargo run --example connect-nmx` issues `RegisterEngine2` and observes `partnerVersion == 6` in the response (cite `docs/DotNet10-Native-Library-Plan.md:64-73` for the expected value).
|
||||
2. The Rust callback exporter accepts an inbound `IRemUnknown::RemQueryInterface` from `NmxSvc.exe` and returns the negotiated `INmxSvcCallback` interface pointer — i.e. the server-side handshake against our exported OBJREF completes without an `IRemUnknown` reject.
|
||||
3. At least one `INmxSvcCallback::StatusReceived` frame is observed end-to-end through the Rust callback exporter (raw frame bytes captured to a fixture under `tests/fixtures/m2-status-frame/`).
|
||||
|
||||
NTLMv2 packet-integrity matches the .NET reference's `MakeSignature` outputs on a fixed challenge fixture (i.e. byte-equivalent signature for fixed input).
|
||||
|
||||
### M3 — NMX session + Galaxy resolver
|
||||
|
||||
- `mxaccess-galaxy`: `tiberius`-based tag resolver, user resolver. Replicates the recursive CTE from `GalaxyRepositoryTagResolver.cs:209–293`.
|
||||
- `mxaccess-nmx`: `NmxClient` with `register_engine_2`, `transfer_data`, `add_subscriber_engine`, `set_heartbeat_send_interval`. Builds `MxReferenceHandle` from resolver output + `MxReferenceHandle::compute_name_signature`.
|
||||
- Live probes: write `TestChildObject.TestInt = 123`, subscribe, receive callback.
|
||||
|
||||
**Definition of done:** scalar write + scalar subscribe round-trip live, identical bytes to `captures/022-frida-write-test-int-sequence-106-108` and `captures/058-frida-subscribe-testint` (verified against `captures/` directory listing — `077-frida-suspend-advised-scanstate` is a *suspend* capture and belongs to R5, not M3). Re-implementations of `dotnet run --project src\MxNativeClient.Probe -- --probe-session-write` and `--probe-session-subscribe` succeed when invoked as `cargo run --example session-write` / `--example session-subscribe`.
|
||||
|
||||
### M4 — Async Tokio façade (NMX path)
|
||||
|
||||
- `mxaccess::Session` over `NmxTransport`.
|
||||
- `mxaccess::Subscription` as `Stream<Item = Result<DataChange, Error>>`.
|
||||
- `Session::write`, `write_with_completion`, `write_with_timestamp`, `write_secured`, `write_secured_at`, `read`, `subscribe`, `subscribe_many`.
|
||||
- Recovery policy + recovery events (mirroring `MxNativeSession.RecoveryAttempt*` events).
|
||||
- `tracing` instrumentation throughout.
|
||||
- Examples: `connect-write-read.rs`, `subscribe.rs`, `recovery.rs`, `multi-tag.rs`.
|
||||
|
||||
**Definition of done:** the public API is end-to-end usable without referencing `mxaccess-codec` directly. A consumer can write 30 lines of `tokio::main` code and get live data. `cargo doc --workspace --open` produces useful API docs. The `examples/` programs all exit `0` against a live AVEVA install.
|
||||
|
||||
**Parity test fixtures** (verified against `wwtools/mxaccesscli/`):
|
||||
- A bare-array reference (e.g. `Obj.Arr` without brackets) returns `MxStatus { category: CommunicationError, detail: 1003 }`. Source: `wwtools/mxaccesscli/docs/usage.md:215,299`. Add as a `mxaccess` integration test that subscribes to a known bare-array reference and asserts the exact `(category, detail)` tuple.
|
||||
- Read-as-subscribe parity: a `read(tag, timeout)` against a tag that never publishes returns `Error::Timeout(_)`, with no leaked advise on the server side (verified by issuing a subsequent `subscribe` and confirming no stale-handle error). Source: `wwtools/mxaccesscli/docs/usage.md:24` and `wwtools/mxaccesscli/src/MxAccess.Cli/Commands/ReadCommand.cs:14-78`.
|
||||
- Verified Write parity: `write_secured(tag, value, current_user_id, verifier_user_id)` with `current_user_id == verifier_user_id` (single-user path) and with two distinct ids (two-person verification path) both succeed against a tag whose security classification permits it. Source: `wwtools/mxaccesscli/src/MxAccess.Cli/Commands/WriteCommand.cs:151-155,196-199`.
|
||||
|
||||
### M5 — ASB transport
|
||||
|
||||
- `mxaccess-asb-nettcp`: [MS-NMF] net.tcp framing + [MC-NBFX]/[MC-NBFS] binary message encoding (the default `NetTcpBinding` encoder, *not* SOAP/XML — see `src/MxAsbClient/MxAsbDataClient.cs:660-685`) + WCF custom-binary inside ASBIData base64.
|
||||
- `mxaccess-asb`: `AsbClient` with Connect, RegisterItems, Read, Write, CreateSubscription, AddMonitoredItems, Publish, Disconnect.
|
||||
- `mxaccess::Session` over `AsbTransport`; capabilities reflect ASB limits (no `subscribe_buffered`, no `Activate`/`Suspend`, no `OperationComplete` outside the proven write-completion frame).
|
||||
- DPAPI shared-secret read on Windows; explicit `AsbCredentials::shared_secret(&[u8])` constructor as escape hatch.
|
||||
|
||||
**Definition of done:** `cargo run --example asb-subscribe -- --tag TestChildObject.TestInt` succeeds against a live ASB endpoint. Round-trip parity with `dotnet run --project src\MxAsbClient.Probe`. Type matrix in `mxaccess-asb` covers what `work_remain.md:108–113` documents as proven: scalar Boolean, Int32, Float, Double, String, DateTime, Duration, plus "deployed array tags" (the array shapes actually exercised against the live VM, not all eight scalar arrays). Less-common ASB types and the unexercised scalar array shapes are deferred — added only as needed by real deployed tags, per `work_remain.md:110`.
|
||||
|
||||
### M6 — Compatibility shim + production hardening
|
||||
|
||||
- `mxaccess-compat`: `LMXProxyServer`-shaped methods on top of `Session`.
|
||||
- `subscribe_buffered` (NMX feature) — guarded by `BufferedOptions`; no synthesis if provider returns single-sample batches.
|
||||
- Performance pass: zero-copy frame parsing where possible (`bytes::Bytes`), pre-allocated `BytesMut` per session, codec allocation count benchmarked.
|
||||
- Optional `metrics` feature emitting counters / histograms.
|
||||
- Docs: `cargo doc` published; `cargo public-api` baseline established.
|
||||
- Release: `cargo publish` all crates.
|
||||
|
||||
**Definition of done:** the codec hits the per-write allocation target from R12 (< 5 allocations per write at steady state, measured via `cargo bench` with a counting allocator); live subscribe under churn does not allocate per-message; `cargo public-api` produces a stable surface that clears review.
|
||||
|
||||
`cargo bench` latency numbers are reported but **not gating** — this matches the V1 non-goal "we measure but don't gate beyond M6's loose acceptance bar" below. There is no .NET microbench harness to compare against (the .NET reference ships probes and assertion-style runners, not benchmarks); building one is out of scope for V1. If a future milestone adds a comparison harness, document it here and only then can a "comparable to .NET" clause be added without contradicting the non-goal.
|
||||
|
||||
## Validation strategy
|
||||
|
||||
Three lines of defense against regression.
|
||||
|
||||
### 1. Round-trip fixtures
|
||||
|
||||
Every byte sequence in `captures/0NN-frida-*` and `analysis/frida/*.tsv` is a fixture. Test cases load the bytes, decode them, re-encode, and assert equality. New scenarios add new fixtures, never modify old ones. Fixtures live under `crates/mxaccess-codec/tests/fixtures/` (linked or copied).
|
||||
|
||||
### 2. Live probes
|
||||
|
||||
Per-milestone live probes mirror the .NET probes (`MxNativeClient.Probe`, `MxAsbClient.Probe`). They run only when `MX_LIVE` env var is set, match the .NET command-line surface, and print the same artifacts. The Rust examples are the canonical live tests.
|
||||
|
||||
**CI lane status (V1).** Live probes require a Windows runner with AVEVA System Platform installed and a populated Galaxy DB. AVEVA System Platform is a licensed Windows-only install with a SQL Galaxy attached, so a hosted CI runner cannot be spun up from a public image. **V1 ships without a hosted CI lane for live probes** — they run only on the maintainer's workstation, gated by `MX_LIVE=1`. PRs that touch the live-probe surface (anything under `crates/*/examples/` invoked when `MX_LIVE` is set, plus `mxaccess-galaxy` and the NMX/ASB transport crates' integration tests) require a screenshot or capture log from a successful local run attached to the PR. Hosted CI for milestones M2/M3/M4/M5 covers `cargo build`, `cargo test --workspace` (non-live), and `cargo clippy` only. Building a hosted live-probe lane (a pinned AVEVA VM image + Galaxy seed snapshot, owned by the project) is a stretch goal post-V1; it is not a V1 deliverable.
|
||||
|
||||
### 3. Cross-implementation parity
|
||||
|
||||
For each milestone with a `dotnet run` equivalent, the same operation runs through both:
|
||||
- the .NET reference (`dotnet run --project src\MxNativeClient.Probe -- --probe-session-write ...`)
|
||||
- the Rust port (`cargo run --example session-write -- ...`)
|
||||
|
||||
Wireshark / Frida captures of both runs are diffed; any byte-level divergence is a regression.
|
||||
|
||||
**Caveat — parity is not correctness.** Byte-parity tests confirm the Rust port matches the .NET reference; they do **not** confirm correctness in the absolute sense. Specifically, the completion-only frame mappings (`0x00`, `0x41`, `0xEF`, plus `MXSTATUS_PROXY[]` conversion — see `work_remain.md:170-174`) are unmapped in both implementations; both preserve them verbatim. If the .NET reference is wrong about one of these bytes, the Rust port will also be wrong, and the parity test will still pass green. R3 and R4 in `70-risks-and-open-questions.md` track this. These frames are marked "preserved verbatim" rather than "verified correct" in the milestone DoDs that touch them.
|
||||
|
||||
## Build & test commands
|
||||
|
||||
To be added to `CLAUDE.md` "Common commands" once `rust/` exists:
|
||||
|
||||
```powershell
|
||||
# Workspace-wide
|
||||
cargo build --workspace
|
||||
cargo test --workspace
|
||||
cargo clippy --workspace -- -D warnings
|
||||
cargo fmt --check
|
||||
|
||||
# Single crate
|
||||
cargo test -p mxaccess-codec
|
||||
|
||||
# Live probes (require AVEVA + Galaxy DB). Credentials come from Infisical via
|
||||
# wwtools/secrets/Get-Secret.ps1 — never inline plaintext. The setup script
|
||||
# fetches the WW_VM_ADMIN_* secrets and (when present) the AVEVA-specific
|
||||
# ASB_SHARED_SECRET, then exports MX_LIVE, MX_NMX_HOST, MX_GALAXY_DB,
|
||||
# MX_TEST_USER, MX_TEST_DOMAIN, MX_TEST_PASSWORD, MX_ASB_SHARED_SECRET.
|
||||
. .\tools\Setup-LiveProbeEnv.ps1 # dot-source so env vars persist
|
||||
cargo test -p mxaccess --features live -- --ignored
|
||||
|
||||
# CI / dev fallback when ASB shared secret is not yet in Infisical:
|
||||
. .\tools\Setup-LiveProbeEnv.ps1 -SkipAsbSecret
|
||||
# ...then construct AsbCredentials::shared_secret(&[u8]) explicitly in the test.
|
||||
|
||||
# Examples
|
||||
cargo run --example connect-write-read
|
||||
cargo run --example subscribe -- --tag TestChildObject.TestInt
|
||||
cargo run --example asb-subscribe -- --tag TestChildObject.TestInt
|
||||
|
||||
# Benchmarks (M6)
|
||||
cargo bench -p mxaccess-codec
|
||||
```
|
||||
|
||||
## Sequencing dependencies
|
||||
|
||||
| Milestone | Depends on | Blocks |
|
||||
|---|---|---|
|
||||
| M0 | nothing | M1 |
|
||||
| M1 | M0 | M2, M3, M5 |
|
||||
| M2 | M0, M1 | M3 |
|
||||
| M3 | M1, M2 | M4 |
|
||||
| M4 | M3 | M6 (NMX) |
|
||||
| M5 | M0, M1 (codec only — ASB does not need M2/M3) | M6 (ASB) |
|
||||
| M6 | M4 (NMX consumers) or M5 (ASB consumers) | release |
|
||||
|
||||
M5 can be developed in parallel with M3/M4 because ASB does not depend on DCE/RPC **and** because the `Transport` trait (plus the placeholder `Session` shape it returns) is defined in M0, not M4. M0 publishes the empty/stub trait; M4 fills in the concrete NMX-side `Session` recovery policy, `Stream<Item = DataChange>`, and `NmxTransport` impl; M5 fills in `AsbTransport` against the same M0 trait. Without that M0-level trait split, M5 would block on M4 (since the Session/Transport types live in `mxaccess`, which sits below both transports per `30-crate-topology.md`). The two transport paths converge into the same `Session` at M4 (NMX) and M5 (ASB).
|
||||
|
||||
## What this roadmap deliberately does not include (V1)
|
||||
|
||||
- `cargo bench` numbers as gating criteria. We measure but don't gate beyond M6's loose acceptance bar.
|
||||
- Drop-in COM interop. `mxaccess-compat` wraps the API shape, not the COM ABI. A `mxaccess-compat-com` crate is post-V1.
|
||||
- Multi-runtime support (smol, async-std). Tokio only.
|
||||
- 32-bit Windows. x64 only by design.
|
||||
- Linux-first deployment. Linux is a stretch goal sitting behind feature flags; see `70-risks-and-open-questions.md` Q3.
|
||||
- Full `OperationComplete` parity. Bound to whether the .NET reference can prove the trigger conditions; see R3/R4 in `70-risks-and-open-questions.md`.
|
||||
@@ -0,0 +1,321 @@
|
||||
# Risks and open questions
|
||||
|
||||
This is the punch list of things that could break or are unproven. Each entry is tagged R(isk) or Q(uestion), with a current best answer and what would settle it.
|
||||
|
||||
## Protocol-level
|
||||
|
||||
### R1 — net.tcp / WCF framing **and binary message encoding** complexity
|
||||
|
||||
**Severity: P0** (project-blocker — entire ASB data plane, ~3000 LoC)
|
||||
|
||||
The .NET reference uses `System.ServiceModel.NetTcpBinding` for ASB (`src/MxAsbClient/MxAsbDataClient.cs:663`: `new NetTcpBinding(SecurityMode.None)` with no message-encoding override). With no override, WCF defaults to the **binary message encoder** — i.e. .NET Binary XML ([MC-NBFX]) with a static dictionary lookup ([MC-NBFS]) — **not** SOAP/XML. There is no Rust port of WCF, and `quick-xml` (or any other XML toolkit) is **not sufficient** to read or write these payloads: the body bytes are tokenised binary nodes that reference dictionary string IDs.
|
||||
|
||||
So the hand-rolled scope is two layers, not one:
|
||||
|
||||
1. **Framing** per [MS-NMF] (record types: preamble, preamble-ack, sized-envelope, end, fault) plus the reliable-session ack handling on the underlying `net.tcp` channel.
|
||||
2. **Message encoding** per [MC-NBFX] (binary XML node tokens, length-prefixed strings, prefixed/typed attributes, end-element markers) **plus** [MC-NBFS] (the static dictionary that holds the SOAP/WS-Addressing/`IASBIDataV2`-action strings the encoder references by ID instead of inlining).
|
||||
|
||||
**Options:**
|
||||
1. Hand-roll both framing ([MS-NMF]) and binary message encoding ([MC-NBFX] + [MC-NBFS]). Estimate ~3000 LoC across both layers (the encoder/dictionary work is the majority — framing alone is ~1500 LoC; the binary XML codec, dictionary tables, and operation-action mapping are roughly the same again).
|
||||
2. Switch ASB to its HTTP variant if the deployed AVEVA instance supports it (this would let us use a normal text SOAP/XML stack and skip both [MS-NMF] and [MC-NBFX]/[MC-NBFS] entirely).
|
||||
3. Wrap the .NET ASB DLL in a process and call it via stdin/stdout JSON-RPC.
|
||||
|
||||
**Current best answer:** option 1 (hand-roll both layers). The two specs are public, the encoder is deterministic, and the .NET reference's `AsbMessageDumpBehavior` already produces ground-truth byte vectors for the dictionary and operation set we use. `quick-xml` may help with any auxiliary text-XML the wider stack uses, but it cannot decode the binary-encoded message bodies — that requires the [MC-NBFX] + [MC-NBFS] codec.
|
||||
|
||||
**Settles when:** `mxaccess-asb-nettcp` parses every captured request/reply byte-identical to the .NET reference's `IClientChannel` payload dump for the proven type matrix, including correct dictionary-ID resolution and round-trip of every observed binary XML node tag.
|
||||
|
||||
### R2 — Buffered subscription is delivery cadence, not multi-sample payloads
|
||||
|
||||
**Severity: P3** (likely a non-issue — see verification below)
|
||||
|
||||
`subscribe_buffered` was originally framed as "we don't know if the codec layout for multi-sample `DataChangeBatch` is right." Verification against `wwtools/mxaccesscli/docs/api-notes.md:97-100,138-140,154-157` reverses this framing: `OnBufferedDataChange(hServer, hItem, MxDataType, value, quality, timestamp, statuses)` is **single-sample-per-event**, identical in shape to `OnDataChange`. The "buffer" is a delivery cadence — `SetBufferedUpdateInterval(ms)` collates per-tick updates and flushes them at the configured interval — **not** a multi-sample payload bundle. The native multi-sample bodies the original R2 worried about may not exist on the LMX surface at all.
|
||||
|
||||
**Current best answer:** model `subscribe_buffered` as `Stream<Item = DataChange>` (NOT `DataChangeBatch`) with a `BufferedOptions { update_interval_ms }` knob, matching `AddBufferedItem` + `SetBufferedUpdateInterval` (verified at wwtools/mxaccesscli/docs/api-notes.md:140). If a future capture surfaces a true multi-sample body, reopen — but the burden of proof has flipped. **Do not synthesise** multi-sample bodies; the LMX surface emits one per event.
|
||||
|
||||
**Settles when:** either (a) a captured `OnBufferedDataChange` event with multi-sample body bytes is observed (which would contradict the LMX docs and require codec rework), or (b) the V1 codec ships and no consumer reports missing multi-sample semantics. Default-positive: this likely settles silently as "not a real risk."
|
||||
|
||||
### R3 — `OperationComplete` trigger unproven
|
||||
|
||||
**Severity: P1** (significant blocker for OperationComplete consumers — ships verbatim, no typed promotion)
|
||||
|
||||
`work_remain.md:154–163`: ASB has no native OperationComplete; NMX completion-only frames have no proven mapping table. The .NET reference does not synthesise the event; the Rust port must not either.
|
||||
|
||||
**Current best answer:** expose `Session::operation_status_events()` as `Stream<Item = RawOperationStatus>` carrying frame bytes. Promote to a typed `WriteCompleted` only if the frame matches the proven `00 00 50 80 00` 5-byte pattern.
|
||||
|
||||
**Settles when:** indefinitely deferred — see Open evidence gaps table. Settle criteria depends on a Ghidra mapping table (the `aaDCT` tables in `Lmx.dll`) that does not exist in `analysis/ghidra/` and has no owner. No current artifact in this repo produces the byte→status mapping. Reopen if a future capture or decompiled output produces evidence.
|
||||
|
||||
### R4 — Completion-only byte mapping
|
||||
|
||||
**Severity: P1** (significant blocker for typed completion semantics — ships verbatim)
|
||||
|
||||
`0x00`, `0x41`, `0xEF` are observed as raw 1-byte completion frames (`work_remain.md:164–174`). They get preserved as `RawOperationStatus { byte: u8 }` without typed promotion.
|
||||
|
||||
**Current best answer:** `Session::operation_status_events()` carries `RawOperationStatus(u8)` for these. Document as "preserved verbatim until mapping table is found." Same Ghidra dependency as R3.
|
||||
|
||||
**Settles when:** indefinitely deferred — see Open evidence gaps table. Settle criteria depends on the same Ghidra mapping table as R3, which does not exist in `analysis/ghidra/` and has no owner. Reopen if a future capture or decompiled output produces evidence.
|
||||
|
||||
### R5 — Activate / Suspend behaviour
|
||||
|
||||
**Severity: P1** (significant blocker for Activate/Suspend consumers — surfaced as experimental)
|
||||
|
||||
`MxNativeCompatibilityServer.Suspend` and `Activate` return MxStatus but the trigger conditions beyond "pending/requesting" are unknown. The .NET reference does not call them on a live path.
|
||||
|
||||
**Current best answer:** expose `Session::suspend(item)` and `Session::activate(item)` returning `Result<MxStatus, Error>`. Document as experimental until a deployed scenario exercises them. Do not build callback-driven state transitions on top.
|
||||
|
||||
**Settles when:** a live capture shows the operation triggering an observable state change in `NmxSvc` plus a corresponding callback frame.
|
||||
|
||||
### R6 — `0x80004021` in `MxNativeSession.WriteSecuredAsync` is a .NET-reference defect, not a real LMX constraint
|
||||
|
||||
**Severity: P3** (formerly P1 — downgraded after `wwtools/mxaccesscli/` verification)
|
||||
|
||||
Original framing of this risk asserted that "`WriteSecured` (without `2`) returns `0x80004021` before sending the body" and concluded the single-token form was deprecated or rejected at the wire. That framing was wrong. Verification against `wwtools/mxaccesscli/` (a working CLI built on the production `LMXProxyServerClass` 32-bit COM proxy, i.e. the actual MxAccess surface) establishes:
|
||||
|
||||
- The LMX `WriteSecured` ALWAYS takes **two** user ids: `(currentUserId, verifierUserId, value)` (`wwtools/mxaccesscli/docs/api-notes.md:60-72`, `wwtools/mxaccesscli/src/MxAccess.Cli/Mx/MxItem.cs:69-70`).
|
||||
- "Single-user secured write" is the same API called with `currentUserId == verifierUserId` — it is **not** a separate API surface (`wwtools/mxaccesscli/src/MxAccess.Cli/Commands/WriteCommand.cs:151-155,196-199`).
|
||||
- `WriteSecured2` adds a timestamp parameter; it does **not** add a second token. The 1-vs-2 distinction in this design's earlier drafts was a confusion between "with timestamp" (Write2 vs Write) and "two-token" (which is always true).
|
||||
- The `0x80004021` failure observed in `src/MxNativeClient/MxNativeSession.cs:218-221` is therefore a defect of the .NET native reimplementation, not behaviour the LMX proxy itself produces.
|
||||
|
||||
**Current best answer:** `mxaccess` exposes `write_secured(reference, value, current_user_id, verifier_user_id)` (no timestamp) and `write_secured_at(reference, value, timestamp, current_user_id, verifier_user_id)` (with timestamp), matching `WriteSecured` and `WriteSecured2` respectively. Both always pass two user ids; callers performing single-user secured writes pass the same id twice. The `Error::Unsupported` mapping for "single-token form" has been removed from `50-error-model.md`.
|
||||
|
||||
**Settles when:** the `MxNativeSession.WriteSecuredAsync` defect is fixed in the .NET reference, or a captured frame shows the LMX proxy itself producing `0x80004021` on a `WriteSecured` call (which would resurrect the original framing). Default-positive: this likely settles silently as "not a real risk."
|
||||
|
||||
### R7 — Status mapping for non-success ASB cases
|
||||
|
||||
**Severity: P2** (nice-to-have / minor — unknown bytes preserved as raw)
|
||||
|
||||
`work_remain.md:132–143`: live probes have not yet exercised access-denied and no-communication on the current VM. The Rust port mirrors what the .NET reference proves; remaining ASB error/quality/detail bytes are preserved as raw and surfaced through `MxStatus.detail` until a safe live capture lands.
|
||||
|
||||
**Current best answer:** preserve unknown payloads. Document the gap.
|
||||
|
||||
**Settles when:** live capture against a configured access-denied tag and a no-communication endpoint produces the expected `MxStatus` shape.
|
||||
|
||||
## Implementation-level
|
||||
|
||||
### R8 — NTLMv2 cross-domain auth
|
||||
|
||||
**Severity: P1** (significant blocker for cross-domain deployments — single-domain ships)
|
||||
|
||||
Captured traffic is single-domain (local AVEVA install). Cross-domain NTLM requires AV pair handling that has not been tested.
|
||||
|
||||
**Current best answer:** implement AV pair parsing per [MS-NLMP] §2.2.2.1 and document `mxaccess-rpc` as untested across domains. Provide fixtures from any successful cross-domain probe.
|
||||
|
||||
**Settles when:** a cross-domain probe runs successfully end-to-end with packet-integrity signatures verified.
|
||||
|
||||
### R9 — DPAPI dependency for ASB
|
||||
|
||||
**Severity: P2** (nice-to-have / minor — explicit `shared_secret` constructor is the escape hatch)
|
||||
|
||||
ASB shared-secret retrieval uses `ProtectedData.Unprotect` (LocalMachine scope). Linux has no DPAPI. There is no portable replacement; the secret is encrypted at rest with a Windows-specific KCV.
|
||||
|
||||
**Current best answer:** `mxaccess-asb` requires Windows for the credential read path. Provide an explicit `AsbCredentials::shared_secret(secret: &[u8])` constructor that bypasses DPAPI for tooling that has the secret in plaintext (e.g. CI tests, ops automation).
|
||||
|
||||
**Settles when:** never. DPAPI is not portable; the escape hatch is the explicit constructor.
|
||||
|
||||
### R10 — Galaxy SQL schema versioning
|
||||
|
||||
**Severity: P1** (significant blocker per affected feature — break-loud on mismatch)
|
||||
|
||||
The recursive CTE in `GalaxyRepositoryTagResolver.cs` assumes the current AVEVA schema. Older Galaxy versions may have different table layouts.
|
||||
|
||||
**Current best answer:** target the schema that ships with the AVEVA version `MxNativeClient` validates against. Document the expected schema version. Break loudly on mismatch (`ConfigError::Galaxy { reason }`).
|
||||
|
||||
**Settles when:** a multi-version test matrix is set up. Probably not in V1.
|
||||
|
||||
### R11 — x86 proxy/stub workaround
|
||||
|
||||
**Severity: P2** (nice-to-have / minor — integration test catches binding-shape drift)
|
||||
|
||||
`NmxSvcps.dll` is x86-only. The replacement strategy bypasses the in-proc proxy by speaking ORPC directly. This works because we control both Type1/Type3 marshalling and `RemQueryInterface`. But it depends on `NmxSvc` continuing to expose IPv4 NCACN_IP_TCP bindings via the OXID.
|
||||
|
||||
**Current best answer:** add an `mxaccess-rpc` integration test that asserts `ResolveOxid` returns at least one `ncacn_ip_tcp` binding. Fail fast if the binding shape changes in a future AVEVA release.
|
||||
|
||||
**Settles when:** that integration test is in CI gating.
|
||||
|
||||
### R12 — Performance — codec allocations
|
||||
|
||||
**Severity: P2** (nice-to-have / minor — micro-optimisation in M6)
|
||||
|
||||
The .NET reference reuses `byte[]` arrays via `MemoryPool`; the Rust port should use `bytes::Bytes` for zero-copy on receive and pre-allocate via `BytesMut` on encode. The codec currently allocates `Vec<u8>` per encode; tolerable for V1, worth optimising in M6.
|
||||
|
||||
**Current best answer:** use `BytesMut::with_capacity(MAX_FRAME)` per session. Bench in M6. Aim for < 5 allocations per write at steady state.
|
||||
|
||||
**Settles when:** `cargo bench` shows the target allocation count.
|
||||
|
||||
### R13 — DataUpdate `recordCount != 1` panic risk
|
||||
|
||||
**Severity: P1** (significant blocker for production stability — soft-error path documented)
|
||||
|
||||
`src/MxNativeCodec/NmxSubscriptionMessage.cs:71-74` hard-throws `ArgumentException` on any `0x33` DataUpdate whose `recordCount` is not exactly 1:
|
||||
|
||||
```csharp
|
||||
if (recordCount != 1)
|
||||
{
|
||||
throw new ArgumentException("Observed NMX DataUpdate callback parser currently supports one record per body.", nameof(inner));
|
||||
}
|
||||
```
|
||||
|
||||
R2 covers the missing **fixture** for the multi-record case, but the bigger production-side risk is separate: the first time AVEVA emits a multi-record `0x33` against a deployed Rust client, the codec — if it ports the .NET behaviour faithfully — will panic / return a hard decode error and tear down the subscription. We have no fixture proving multi-record bodies don't happen on real installs; we only have evidence they haven't happened on **our** install.
|
||||
|
||||
**Options:**
|
||||
1. Mirror the .NET reference and hard-error on `recordCount != 1`. Loud, but kills the session.
|
||||
2. Surface as a typed soft error (e.g. `ProtocolError::Decode { reason: "multi-record DataUpdate not yet supported" }`), log at warn, and drop the frame. The subscription stays alive; the consumer sees a single missed update, not a teardown.
|
||||
3. Speculatively decode multi-record (assume the per-record layout from the single-record case repeats) — explicitly forbidden by CLAUDE.md "Do not fabricate protocol behavior."
|
||||
|
||||
**Current best answer:** option 2 in Rust. Map the condition to `ProtocolError::Decode { reason: "multi-record DataUpdate not yet supported" }`, emit a `tracing::warn!` with the raw frame bytes attached as a hex field, and continue. Do **not** synthesise per-record decoding. The .NET-style hard throw stays as-is in the .NET reference (it is the executable spec, and a panic there is what produces the fixture we need — see R2). The Rust port deliberately diverges here on production-safety grounds, with the divergence documented in `50-error-model.md`.
|
||||
|
||||
**Settles when:** R2's multi-record fixture lands and the codec gains a proven typed decode path; then R13 collapses into "supported, no special handling" and the soft-error branch becomes dead code that can be removed.
|
||||
|
||||
### R14 — Fabricated `0x80004021 → StaleItem` mapping
|
||||
|
||||
**Severity: P1** (significant blocker — fabrication risk; corrected in `50-error-model.md`)
|
||||
|
||||
A draft of `50-error-model.md` mapped `HRESULT 0x80004021` to a typed `StaleItem` error category for regular (non-secured) operations. **This mapping is unevidenced.**
|
||||
|
||||
- R6 already covers `0x80004021` on secured-write specifically: per `wwtools/mxaccesscli/` verification, this is a `MxNativeSession.WriteSecuredAsync` defect (the .NET native reimplementation throws `NotSupportedException` before reaching the wire), **not** a real LMX-proxy constraint. The production LMX surface accepts `WriteSecured` with two user ids unconditionally. R6 explicitly does **not** generalise the .NET defect to a typed "stale" error.
|
||||
- For regular operations, the actual stale-handle / invalid-arg HRESULT observed in captures is `0x80070057` (`E_INVALIDARG`). There is no captured frame, decompiled mapping table, or live probe in this repo that produces `0x80004021` on a non-secured path, and certainly none that justifies tagging it `StaleItem`.
|
||||
|
||||
This is a fabrication risk: the kind of "looks plausible from naming" mapping that CLAUDE.md "Do not fabricate protocol behavior" exists to prevent.
|
||||
|
||||
**Options:**
|
||||
1. Drop the `StaleItem` category entirely. Regular-op `0x80004021`, if ever observed, falls through to the generic `Hresult { code, hint: None }` branch with the raw HRESULT preserved.
|
||||
2. Keep `StaleItem` but rename the source HRESULT to `0x80070057` and require a captured fixture before promoting any frame to that category.
|
||||
3. Keep the `0x80004021 → StaleItem` mapping. **Forbidden** — no evidence backs it.
|
||||
|
||||
**Current best answer:** option 1 for V1. Surface unknown HRESULTs as `Error::Hresult { code }` and let consumers match on the raw value. `50-error-model.md` is being corrected in parallel (review cluster 3) to remove the `StaleItem` reference; this risk register entry exists so the mistake is recorded for future contributors and not silently re-introduced when someone reaches for an ergonomic typed name.
|
||||
|
||||
**Settles when:** indefinitely deferred — no current artifact maps either `0x80004021` or `0x80070057` to a "stale handle" semantic, and inventing one violates the "don't fabricate protocol behaviour" rule. If a future capture or decompiled mapping table produces evidence, reopen as a typed-error proposal.
|
||||
|
||||
### R15 — Drop-time async cleanup hazards
|
||||
|
||||
**Severity: P1** (significant blocker — server-side handle leak on runtime shutdown)
|
||||
|
||||
`design/00-overview.md:38` states the principle "no spawn from inside Drop." `design/20-async-layer.md` and `design/50-error-model.md` describe Subscription drop semantics that fire `UnAdvise`/`UnregisterEngine` against the server. Reconciling these is non-trivial because:
|
||||
|
||||
- `tokio::spawn` from `Drop` panics if no Tokio runtime is current at drop time. A user dropping a `Session` from a `std::thread` after `Runtime::shutdown_timeout` returns will hit this.
|
||||
- During `Runtime::shutdown_timeout`, spawned tasks are aborted before they can flush. Even if a runtime is current, spawning the cleanup from `Drop` does not guarantee the unadvise/unregister actually reaches the server — the drop call returns immediately and the spawned task may be cancelled before the bytes hit the wire.
|
||||
- The result is a **server-side handle leak in `NmxSvc`**: subscriptions stay live, registered engines stay registered, until the TCP connection itself is torn down (which only happens once the kernel notices the socket is dead).
|
||||
|
||||
**Options:**
|
||||
1. Best-effort `tokio::spawn` from `Drop`. Documented hazard. Leaks on runtime shutdown and panics on no-runtime.
|
||||
2. Drop sends `UnAdvise`/`UnregisterEngine` via a `tokio::sync::oneshot` (or unbounded `mpsc`) to a long-lived connection task that owns the cleanup loop. **Drop itself never spawns** — it pushes a message onto the channel and returns. The connection task drains the channel until the TCP connection is itself dropped, at which point the server cleans up by socket close anyway.
|
||||
3. Require the consumer to call `Session::shutdown(timeout).await` and document Drop as "best-effort, may leak under shutdown" — no automatic cleanup at all.
|
||||
|
||||
**Current best answer:** option 2. A long-lived connection task owns the cleanup channel and drains it; `Drop` pushes a `UnAdvise`/`UnregisterEngine` request onto a `tokio::sync::oneshot` (one per resource) or a per-connection unbounded `mpsc` and returns synchronously. This keeps `Drop` cheap, satisfies "no spawn from Drop," and gives the cleanup a reasonable best-effort guarantee while the connection task is alive. **Runtime-shutdown leak window remains** — if the connection task is itself aborted by `Runtime::shutdown_timeout` before draining the channel, the cleanup messages are dropped on the floor and the server-side handles remain registered until the TCP socket close is observed by `NmxSvc`. This window is documented in `50-error-model.md`'s cancellation semantics; consumers running under explicit shutdown should call `Session::shutdown(timeout).await` for deterministic cleanup. Cite `design/00-overview.md:38` (no-spawn-from-Drop principle), `design/20-async-layer.md` (Subscription drop semantics), `design/50-error-model.md` (cancellation semantics).
|
||||
|
||||
**Settles when:** the connection-task cleanup channel is implemented in M4, a stress test under churn confirms drop semantics on a live runtime do not leak, and the runtime-shutdown leak window is captured in a runnable test fixture (consumer drops `Session` after `Runtime::shutdown_timeout`; assert that the leak is bounded by socket-close timeout).
|
||||
|
||||
### R16 — Crypto/auth crate maintenance drift
|
||||
|
||||
**Severity: P1** (significant blocker — yank/advisory in CI breaks the build)
|
||||
|
||||
The auth surface area depends on a small cluster of marginal-maintenance crates. `design/30-crate-topology.md:130` pins `rc4`, `sha-1`, `md-5`, `num-bigint`; `design/10-raw-layer.md:252` instructs "Do not pull `ring` — hand-roll MD4." Of these:
|
||||
|
||||
- `rc4` is at minimum-maintenance, with a small maintainer pool and no recent releases.
|
||||
- `sha-1` v0.10 is the last RustCrypto release that ships with a deprecation warning (the algorithm itself, not the crate's quality, is what's deprecated upstream).
|
||||
- `md-5` and `num-bigint` are stable but not on the active-development frontier.
|
||||
- The hand-rolled MD4 in `mxaccess-rpc` has no upstream at all — it lives in this repo.
|
||||
|
||||
The risk is that any one of these crates gets **yanked**, picks up an `RUSTSEC` advisory, or stops compiling against a future Rust toolchain, and `cargo-deny` (or `cargo audit`) in CI fails the build for everyone — without any actual bug being found in our usage. This is especially bad if it happens during a live release window.
|
||||
|
||||
**Options:**
|
||||
1. Pin to known-good versions in workspace `Cargo.toml` and let CI break when an advisory lands. Triage manually.
|
||||
2. Pin **and** subscribe to `cargo-deny` advisory feeds with a documented response process; pre-stage replacement plans for each crate (e.g. "if `rc4` is yanked, fall back to a hand-rolled cipher in `mxaccess-rpc::crypto::rc4` — RC4 is ~30 LoC and we already hand-roll MD4").
|
||||
3. Hand-roll all of them up front (RC4, SHA-1, MD5, MD4 are all small) and depend on `num-bigint` only. Reduces the surface area to one external crate; increases the in-repo cryptographic LoC.
|
||||
|
||||
**Current best answer:** option 2 for V1. Pin to known-good versions in workspace `Cargo.toml`; subscribe `cargo-deny` advisories in CI; document a fallback plan per crate (hand-rolled RC4 if `rc4` is yanked, hand-rolled SHA-1/MD5 if `sha-1`/`md-5` are pulled, swap `num-bigint` for `crypto-bigint` if it's pulled). Reassess in M6 and consider option 3 (hand-roll-everything) if any of the pins fire during V1 development. Cite `design/30-crate-topology.md:130` and `design/10-raw-layer.md:252`.
|
||||
|
||||
**Settles when:** `cargo-deny check advisories` runs green in CI on a fresh advisory database, the workspace `Cargo.toml` pins are documented inline with their fallback plans, and a "yank rehearsal" (manually mark a pin as yanked locally and confirm the fallback compiles) has been done at least once per crate.
|
||||
|
||||
## Open questions
|
||||
|
||||
### Q1 — Where does the Rust workspace live? **(unresolved)**
|
||||
|
||||
`CLAUDE.md` proposes a sibling `rust/` directory at `c:\Users\dohertj2\Desktop\mxaccess\rust\`, but this is a *proposal*, not a confirmation: a glob of `rust/` confirms zero files exist there today, and `CLAUDE.md` itself hedges with "when it is started." **M0 cannot start until this is confirmed.**
|
||||
|
||||
**Owner:** project lead.
|
||||
|
||||
**Action:** confirm the path `c:\Users\dohertj2\Desktop\mxaccess\rust\` or pick an alternative location; create the empty `rust/` directory (or sibling) before M0 begins.
|
||||
|
||||
**Current best answer:** still pending. The CLAUDE.md proposal is the default and is what M0 will assume unless overridden, but treat this as an open decision rather than a confirmed answer.
|
||||
|
||||
**Settles when:** the workspace directory exists on disk and contains a `Cargo.toml` (even an empty one).
|
||||
|
||||
### Q2 — License? **(resolved: MIT)**
|
||||
|
||||
The .NET reference has no LICENSE file at the repo root. The Rust crates need one before publish.
|
||||
|
||||
**Resolved (2026-05-05):** **MIT** (single-license, not the dual `MIT OR Apache-2.0`). All workspace deps verified MIT/Apache-2.0 compatible; MIT alone satisfies every dep's downstream license obligation. `LICENSE` file added at the project root (`c:\Users\dohertj2\Desktop\mxaccess\LICENSE`). All crate `Cargo.toml`s set `license = "MIT"` via `workspace.package`.
|
||||
|
||||
**Settles when:** N/A — resolved.
|
||||
|
||||
### Q3 — Cross-platform reach (Linux, macOS)
|
||||
|
||||
The codec, ASB SOAP framing, and the async session are theoretically portable. Galaxy SQL via `tiberius` works on Linux. NTLM works on Linux. DPAPI does not. Active Directory authentication on Linux requires `gssapi` (Kerberos) which is out of scope.
|
||||
|
||||
**Current best answer:** Linux is a **stretch goal** for V1, not a supported target — consistent with `30-crate-topology.md`'s `mxaccess-codec` Targets line ("stretch goal") and `60-roadmap.md`'s "What this roadmap deliberately does not include" (Linux behind feature flags). If pursued, the path is `default-features = false` with the consumer providing credentials and shared secret explicitly. macOS unsupported in V1 (no Galaxy SQL TDS testing on macOS).
|
||||
|
||||
**Settles when:** a Linux integration test runs successfully against a remote AVEVA install. Until then, treat Linux support as aspirational and gate all Linux-specific code paths behind opt-in feature flags.
|
||||
|
||||
### Q4 — How does `mxaccess-compat` handle COM event sinks?
|
||||
|
||||
The .NET `MxNativeCompatibilityServer` raises `OnDataChange` etc. as COM events. `mxaccess-compat` is a Rust API; do we expose them as `Stream`s, callbacks, or both?
|
||||
|
||||
**Current best answer:** Streams, with a separate optional `mxaccess-compat-com` crate (post-V1) that registers `windows-rs`-generated COM classes. The compat crate's primary surface is Rust.
|
||||
|
||||
**Settles when:** a concrete consumer requests COM exposure.
|
||||
|
||||
### Q5 — How do we surface `MxStatus` in `Subscription` items vs `Session` operations?
|
||||
|
||||
For `Session::write()`, a non-Ok status maps to `Error::Status`. For `Subscription::next()`, a non-Ok status comes through as `DataChange { status: MxStatus, ... }` — it is not necessarily an error (a "stale" data change is still a valid frame).
|
||||
|
||||
**Current best answer:** `Session::write()` returns `Err` on non-Ok category. `Subscription::next()` returns `Ok(DataChange { ... })` and the consumer inspects `change.status`. Documented in `50-error-model.md`.
|
||||
|
||||
**Settles when:** API stabilises after consumer feedback.
|
||||
|
||||
### Q6 — Should `Session` be `Clone`?
|
||||
|
||||
Cheap clones via `Arc<SessionInner>` are convenient (handlers can take `Session` by value). But cloning makes shutdown semantics fuzzy: when does `UnregisterEngine` fire?
|
||||
|
||||
**Current best answer:** `Clone + Send + Sync`. Drop of the last clone runs `UnregisterEngine` best-effort via `tokio::spawn`. `Session::shutdown(timeout)` is the explicit, awaitable way for production code.
|
||||
|
||||
**Settles when:** stress test under churn confirms drop semantics are correct.
|
||||
|
||||
### Q7 — M1 `hasDetailStatus` audit
|
||||
|
||||
During M1 wave-1 codec ports, the `subscription_message.rs` agent draft conditionally read the `status: i32` field only when `hasDetailStatus = true`, while requiring a minimum record length of 15 (DataUpdate) regardless. The result: 4 leading status bytes were left unconsumed, then misread as `quality` further down. The defect was caught by round-trip tests (`data_update_boolean_round_trip`, `data_update_has_no_correlation_id`) and fixed: `status: i32` is now read unconditionally per `src/MxNativeCodec/NmxSubscriptionMessage.cs:126-127`; only `detail_status: Option<i32>` is gated on `hasDetailStatus` (`NmxSubscriptionMessage.cs:130-134`).
|
||||
|
||||
**Follow-up:** audit any other codec port (current or future) that takes a `has_detail_status` / `hasDetailStatus` parameter for the same defect pattern — specifically, verify that fields read unconditionally in the .NET source remain unconditional in the Rust port. Likely affected scope: any future helper that ports `ParseRecord` semantics from `NmxSubscriptionMessage.cs`. The inline note at `mxaccess-codec/src/subscription_message.rs` `parse_record` documents the fix.
|
||||
|
||||
**Settles when:** post-M1 audit confirms no other codec module conditionally skips fields the .NET reference reads unconditionally.
|
||||
|
||||
## Open evidence gaps
|
||||
|
||||
These are missing fixtures that the design assumes will land by their respective milestone.
|
||||
|
||||
| Fixture | Needed by | Captured how |
|
||||
|---|---|---|
|
||||
| Multi-sample buffered batch | M6 | provider tuning to exceed buffered queue threshold |
|
||||
| Cross-domain NTLM Type1/2/3 | M2+ | multi-domain AVEVA test harness |
|
||||
| Activate/Suspend transition | M6 | deployed object that goes pending |
|
||||
| `OperationComplete` for non-write op | indefinitely | unknown |
|
||||
| Ghidra mapping table for completion-only bytes (R3/R4) | indefinitely | Ghidra decompile of `Lmx.dll`'s `aaDCT` tables — table not yet present in `analysis/ghidra/` and has no owner |
|
||||
| ASB write timestamp + status fields | M5 | extended ASB Write/PublishWriteComplete probe |
|
||||
| ASB no-communication source-level evidence (`work_remain.md:198`) | M5 | live capture against an unconfigured ASB endpoint |
|
||||
| Partial-cleanup behavior after channel failure (`work_remain.md:196-197`) | M4/M5 | inject mid-flight failure during subscribe, observe cleanup state |
|
||||
| Galaxy schema older version | indefinitely | not in scope for V1 |
|
||||
|
||||
## Things that look risky but aren't
|
||||
|
||||
### "Decode the NDR-bridge to find the value bytes"
|
||||
|
||||
`docs/Transport-Correlation.md:65-70` notes that distinct value probes do not appear in raw TCP — the `CNmxAdapter::PutRequest`/`CNmxAdapter::TransferData` buffers are an "internal adapter representation, not the TCP wire format." This is because the values flow as DCE/RPC stub bytes *inside* the `TransferData` payload, which itself is the 46-byte envelope plus the inner write/advise/subscribe body. The "bridge" is just our codec re-applied at a different boundary; once we encode the envelope correctly, the bytes are there.
|
||||
|
||||
The .NET reference confirms this — `src/MxNativeClient/ManagedNmxService2Client.cs:159-183` (`TransferData` + `ValidateTransferDataBody`) writes the 46-byte envelope directly into the DCE/RPC Request stub body, then forwards the inner; the validator explicitly rejects bodies that lack "an inner message after the 46-byte envelope" (line 182). There is no extra layer. The probe-vs-pcap mismatch is an artefact of not reassembling the inner body, not a missing protocol layer.
|
||||
|
||||
**No risk.** Documented for clarity so future contributors don't chase a non-existent encryption layer.
|
||||
|
||||
### "We need a custom TLB / proxy DLL"
|
||||
|
||||
The .NET reference avoids registering a custom TLB by hand-rolling the callback `IRemUnknown` server in `src/MxNativeClient/ManagedCallbackExporter.cs:44-54` (`CreateCallbackObjRef` builds an OBJREF in memory) plus `src/MxNativeClient/ManagedCallbackExporter.cs:164,195-196` (the `IRemUnknown::RemQueryInterface` server-side handler returns the negotiated `INmxSvcCallback` IPID without any registry-resident TLB or proxy/stub DLL). The Rust port does the same in `mxaccess-callback`. The only registry touchpoint is OXID resolution (read-only) and reading the ASB shared secret (read-only via DPAPI). No installer, no admin elevation.
|
||||
|
||||
**No risk.** Documented because it commonly comes up in DCOM contexts.
|
||||
@@ -0,0 +1,31 @@
|
||||
# `design/` — Rust port architectural plan
|
||||
|
||||
This folder is the design contract for the Rust replacement of AVEVA/Wonderware MXAccess. It is the gap between the .NET reference in `src/` and the Rust crates that will be written under a sibling `rust/` workspace (per `CLAUDE.md`).
|
||||
|
||||
The folder is structured as a small set of focused documents. Read in order; each builds on the previous.
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `00-overview.md` | Mission, two-layer goal, architectural principles, non-goals |
|
||||
| `10-raw-layer.md` | Byte-accurate raw MXAccess layer (codec + transport + session) |
|
||||
| `20-async-layer.md` | Idiomatic Tokio async layer on top of the raw layer |
|
||||
| `30-crate-topology.md` | Cargo workspace, crates, dependencies, build/test commands |
|
||||
| `40-protocol-invariants.md` | Bill of materials: IIDs, opnums, envelope/handle bytes |
|
||||
| `50-error-model.md` | `MxStatus`, error types, panic/cancellation policy |
|
||||
| `60-roadmap.md` | Milestones M0..M6, validation strategy |
|
||||
| `70-risks-and-open-questions.md` | Parity gaps, unproven flows, cross-platform constraints |
|
||||
|
||||
The design is grounded in the .NET reference at `src/` and the protocol artifacts in `docs/`, `analysis/`, and `captures/`. **Do not introduce protocol behavior in these documents that is not already proven in the reference.** When adding a new claim about wire format, cite either:
|
||||
|
||||
- a `.cs` file path in `src/MxNativeCodec/`, `src/MxNativeClient/`, or `src/MxAsbClient/`, or
|
||||
- a `docs/*.md` spec file, or
|
||||
- a `captures/0NN-frida-*` directory or `analysis/frida/*.tsv` row.
|
||||
|
||||
This folder is documentation, not code. When the Rust workspace is created, the design here is the contract it must satisfy. When evidence in `captures/` invalidates a design decision here, update the design first, then the code.
|
||||
|
||||
## Reading order
|
||||
|
||||
- New contributor: 00 → 30 → 10 → 40 → 20 → 50 → 60 → 70.
|
||||
- Protocol question: 40 first, then the relevant section of 10.
|
||||
- API question: 20 first, then 50.
|
||||
- Planning a milestone: 60 first, cross-reference 70 for blockers.
|
||||
@@ -0,0 +1,157 @@
|
||||
# Adversarial design review
|
||||
|
||||
Generated: 2026-05-05. Reviewer: Claude `general-purpose` subagent (per-file, hostile framing).
|
||||
|
||||
This is a *challenge* review — not a style pass. Reviewers are instructed to question implementation choices, design tradeoffs, and assumptions; verify every load-bearing claim against the cited evidence in `src/` (.NET reference), `docs/`, `analysis/`, `captures/`; and surface where the design could fail under real-world conditions.
|
||||
|
||||
## Status (2026-05-05)
|
||||
|
||||
**All findings have been addressed across three triage passes.**
|
||||
|
||||
| Severity | Count | Status |
|
||||
|---|---|---|
|
||||
| `[BLOCKER]` | 24 | All resolved (cluster pass 1) |
|
||||
| `[MAJOR]` | ~26 | All resolved (cluster pass 2) |
|
||||
| `[MEDIUM]` | 3 | All resolved (cluster pass 1) |
|
||||
| `[MINOR]` | ~14 | All resolved (cluster pass 3) |
|
||||
| `[NIT]` | ~7 | All resolved (cluster pass 3) |
|
||||
|
||||
Each finding bullet below is prefixed with `[RESOLVED]` to indicate the design doc has been corrected. The original finding text is preserved verbatim as the audit trail; see `git log design/*.md` for the specific edits. New risks added in response: R13 (`recordCount != 1` panic risk), R14 (fabricated `0x80004021 → StaleItem` mapping), R15 (Drop-time async cleanup hazards), R16 (crypto/auth crate maintenance drift). Severity tiers (P0/P1/P2) were added to all R-items.
|
||||
|
||||
Severity-tag legend (in original review):
|
||||
- **BLOCKER** — must fix before implementation; protocol or safety bug, or load-bearing claim that is fabricated / contradicts evidence.
|
||||
- **MAJOR** — load-bearing assumption that is unsupported, under-specified, or likely wrong.
|
||||
- **MEDIUM** — moderate-severity finding falling between MAJOR and MINOR.
|
||||
- **MINOR** — clarity, consistency, or naming.
|
||||
- **NIT** — style / preference.
|
||||
|
||||
## design/00-overview.md
|
||||
|
||||
- [RESOLVED] `[BLOCKER]` Doc lists `register, write, advise, read` as `Transport` trait primitives (design/00-overview.md:61) and the public API (design/10-raw-layer.md "Read" section, line 199) confirms `ReadAsync` is "implemented as a transient subscription read". Putting `read` on the `Transport` trait alongside the actual wire primitives misrepresents the protocol — there is no NMX read primitive. `MxNativeSession.ReadAsync` (src/MxNativeClient/MxNativeSession.cs:312–351) implements read as `SubscribeAsync` + first-callback-result + dispose. If the `Transport` trait shape implies `read` is transport-level, every transport must reimplement the subscribe-then-cancel dance instead of getting it once at the session layer. This contradicts principle 1 ("do not fabricate protocol behavior") by inventing a wire primitive that does not exist.
|
||||
- [RESOLVED] `[MAJOR]` Diagram labels `mxaccess-asb-soap` as "NetTcp + SOAP" with "DH/HMAC/AES" (design/00-overview.md:75–77). The DH/HMAC/AES claim is supported (src/MxAsbClient/AsbSystemAuthenticator.cs:23–34, 73–122). But the "SOAP" label is misleading: the .NET reference uses `NetTcpBinding(SecurityMode.None)` with the default binary message encoder (src/MxAsbClient/MxAsbDataClient.cs:660–663). SOAP envelope bytes only exist as an in-memory infoset; on the wire it's MS-NMF-framed binary XML. design/70-risks-and-open-questions.md:9–18 (R1) confirms the plan is to "hand-roll the framing per [MS-NMF]" — no SOAP text framing. Crate name `mxaccess-asb-soap` and the diagram label will mislead implementers into thinking SOAP envelopes are on the wire.
|
||||
- [RESOLVED] `[MAJOR]` Principle 3 says "Raw layer is `unsafe`-free. No raw pointers, no `transmute`, no FFI in the public surface" (design/00-overview.md:33), but principle 6 mandates `windows-rs` for "OBJREF building, IPID/OXID/OID handling, GUID literals" (line 36) and the diagram shows `mxaccess-rpc` consuming `windows` crate (line 88). Every `windows-rs` COM call is `unsafe fn`. The .NET reference uses `Type.GetTypeFromProgID` + `Activator.CreateInstance` (src/MxNativeClient/ManagedNmxService2Client.cs:33–36); `windows-rs` exposes equivalents only via `unsafe { CoCreateInstance(...) }`. The doc never reconciles this — either "raw layer is `unsafe`-free in the public surface" (i.e. internal `unsafe` allowed) or `windows-rs` types stay out of `mxaccess-rpc`. As written, principles 3 and 6 are in direct tension.
|
||||
- [RESOLVED] `[MAJOR]` Principle 8 says "No spawn from inside `Drop`; no blocking calls inside `async fn`" (design/00-overview.md:38) but principle 2 in "Async layer" (line 10) promises "drop … cancellation" and 70-risks-and-open-questions.md:24 / R3 design references `Stream` cleanup. Drop-based cancellation of an in-flight subscription on `NmxSvc` requires sending an `unadvise`/`UnregisterReference` frame — that is I/O. If `Drop` cannot spawn and cannot block, the only options are (a) abandon the server-side subscription leak (NMX leak in the service process), (b) require an explicit `close().await` and downgrade `Drop` to a panic-in-debug, or (c) hand the cleanup to a background reaper task (which itself was spawned at session start, not inside Drop). The doc never specifies which, and "drop cancellation" implies (a)/(c) without saying so. Compare `MxNativeSession.DisposeAsync`/explicit unregister flows (src/MxNativeClient/MxNativeSession.cs uses `Volatile`, recovery, etc.) — none are `Drop`-equivalent.
|
||||
- [RESOLVED] `[MAJOR]` Crate-mapping table (design/00-overview.md:22–25) splits `MxNativeClient` into `mxaccess-rpc`, `mxaccess-nmx`, `mxaccess-callback` for transport and puts `MxNativeSession` and `MxNativeCompatibilityServer` solely in the async `mxaccess` crate ("raw layer ends at transport"). But the .NET `MxNativeSession` is the only place where Galaxy resolution + handle lookup + correlation-id bookkeeping + recovery state are wired (src/MxNativeClient/MxNativeSession.cs:90–125, 312–351, 573). If `mxaccess-nmx` exposes only "INmxService2 client + envelope" (design/00-overview.md:69), then `Session` must reimplement every cross-cutting concern (subscription registry, recovery, callback routing) inside the async crate. Either the raw layer is incomplete (cannot be used standalone for "byte-level control" as line 12 promises, because there's no session/correlation surface), or the split is wrong. The 1:1 mapping in the table papers over this.
|
||||
- [RESOLVED] `[MEDIUM]` Principle 9 says "ASB-only paths return `Error::Unsupported` on `NmxTransport` and vice versa; capability is queryable" (design/00-overview.md:39), but non-goal #5 (line 48) says "the Rust port routes those [callback-only ops] to NMX; ASB owns the regular tag data plane only". A consumer holding an `AsbTransport` and calling `activate(item)` will get `Error::Unsupported` — but the natural expectation (set by the non-goals paragraph) is that the high-level `mxaccess` `Session` *transparently routes* callback-only ops to NMX even when ASB is the data plane. The doc never says how the dual-routing works at the `Session` level — does the session hold both transports? Is the user expected to construct two? Principle 9 ("two transports, one façade") and non-goal 5 are in tension.
|
||||
- [RESOLVED] `[MEDIUM]` Line 16: "every async method bottoms out in a sync codec call (`NmxTransferEnvelope.Encode`)". Searched src/MxNativeCodec — there is no `NmxTransferEnvelope.Encode`. The class is `NmxTransferEnvelopeTemplate` with `Encode(ReadOnlySpan<byte>)` (src/MxNativeCodec/NmxTransferEnvelopeTemplate.cs:33). Minor naming drift, but in a doc whose principle 1 is "do not fabricate" and which cites filenames as evidence, citing a non-existent method weakens the rhetorical case.
|
||||
- [RESOLVED] `[MEDIUM]` Principle 7 cites `dbo.gobject` / `dbo.instance` / `dbo.dynamic_attribute` as the SQL surface (design/00-overview.md:37). Verified at src/MxNativeClient/GalaxyRepositoryTagResolver.cs:215, 253, 257. However CLAUDE.md (project root) lists the surface as `aa_attribute` / `aa_object` / `mx_attribute_category`. Grepping the `.cs` file shows zero hits for any of those names. CLAUDE.md is wrong, the design doc is right, but the design doc should call this out as a CLAUDE.md correction — otherwise the next implementer who trusts CLAUDE.md will write SQL against tables that don't exist.
|
||||
|
||||
## design/10-raw-layer.md
|
||||
|
||||
- [RESOLVED] `[BLOCKER]` Doc Item-control table claims `Advise (plain)` opcode `0x1f` = 37 bytes, distinct from `AdviseSupervisory` `0x1f` = 39 bytes (design/10-raw-layer.md:99-104). The .NET reference does not support a 37-byte plain-advise variant: `NmxItemControlMessage.Parse` accepts only `AdviseSupervisory` or `UnAdvise` (src/MxNativeCodec/NmxItemControlMessage.cs:46-48), and `MxNativeCompatibilityServer.AdviseSupervisory` aliases plain `Advise` to it (src/MxNativeClient/MxNativeCompatibilityServer.cs:256-258). The doc's three-row table fabricates a "plain advise" length the codec rejects.
|
||||
- [RESOLVED] `[BLOCKER]` Doc claims Boolean write body has "1 value byte (0xFF/0x00)" giving 37 bytes total (design/10-raw-layer.md:114). Source actually emits 4 value bytes: `[0xff,0xff,0xff,0x00]` or `[0x00,0xff,0xff,0x00]` (src/MxNativeCodec/NmxWriteMessage.cs:257). Total still 37 because the suffix is 11+4 (not 14+4), but the per-byte breakdown in the doc is wrong — the "1 value byte" claim will lead a Rust port to emit a 34-byte body that the receiver rejects.
|
||||
- [RESOLVED] `[MAJOR]` Doc String-write row says "Total = 26 + N" (design/10-raw-layer.md:118). True size is `KindOffset(17)+1+4+4+N+14+4 = 44+N` (src/MxNativeCodec/NmxWriteMessage.cs:150). The row drops the 14+4 suffix that every other row in the same table includes — an inconsistency in the column meaning that will produce undersized buffers.
|
||||
- [RESOLVED] `[MAJOR]` Doc array-body row says "28 + N + 18" with layout "4-byte marker + count(u16) + width(u16) + elements" (design/10-raw-layer.md:120). Code shows count is at body offset 22 and width at 24, with two separate gap regions: bytes 18-21 are zero-padding and bytes 26-27 are zero-padding (src/MxNativeCodec/NmxWriteMessage.cs:179-184). The "4-byte marker" framing implies a single contiguous marker before count; in reality it's a 4-byte gap *and* a 2-byte gap surrounding the count/width. Documenting it that way will round-trip wrong on captures.
|
||||
- [RESOLVED] `[MAJOR]` Doc claims `MxStatus.source: MxStatusSource` (design/10-raw-layer.md:178). The .NET reference field name is `DetectedBy` (src/MxNativeCodec/MxStatus.cs:31). Drift in field name across the parity boundary; tests that round-trip serialize will diverge.
|
||||
- [RESOLVED] `[MAJOR]` Doc claims wire kinds `0x41–0x46` for arrays in `MxValue`/write (design/10-raw-layer.md:120, 149). Encoder side never emits `0x46` — `NmxWriteMessage.GetWireKind` collapses `StringArray` and `DateTimeArray` both to `0x45` (src/MxNativeCodec/NmxWriteMessage.cs:107). Decoder side does treat `0x46` as DateTimeArray (src/MxNativeCodec/NmxSubscriptionMessage.cs:173, 275). The doc lumps both directions as "0x41..0x46", masking an asymmetry the Rust port must replicate.
|
||||
- [RESOLVED] `[MAJOR]` Doc says "`Duration` for `ElapsedTime`: 4-byte milliseconds on the wire" (design/10-raw-layer.md:220). Source decodes the wire as **signed** `i32` (`BinaryPrimitives.ReadInt32LittleEndian`, src/MxNativeCodec/NmxSubscriptionMessage.cs:252) and produces `TimeSpan.FromMilliseconds(milliseconds)`. Rust `std::time::Duration` is unsigned — a negative ms value (which the encoding allows) will panic or be clamped. Either spec a signed type (`time::Duration` or `i64` ms) or document the negative-handling policy.
|
||||
- [RESOLVED] `[MAJOR]` Doc DataUpdate parser sketch says `recordCount(i32, typically 1)` and shows generic `records[recordCount]` (design/10-raw-layer.md:144-147). The .NET reference hard-rejects any record count != 1: `if (recordCount != 1) throw` (src/MxNativeCodec/NmxSubscriptionMessage.cs:71-74). The Rust sketch implies general support for N records that the executable spec does not provide; either lift the constraint with capture evidence or document it as an asserted invariant.
|
||||
- [RESOLVED] `[MAJOR]` Doc says NTLM Type1 negotiate flags are "Unicode | RequestTarget | Sign | Seal | ExtendedSessionSecurity | Negotiate128 | KeyExchange" (design/10-raw-layer.md:246). Searched `ManagedNtlmClientContext.cs` for the actual flag set used in the negotiate emission — the file does derive sign/seal/sequence keys (src/MxNativeClient/ManagedNtlmClientContext.cs:177-200) and uses `_user.ToUpperInvariant()` for response key derivation (line 79), but the doc lists no citation for the Type1 flag bitfield. No line/offset is referenced; the flag list is a claim a Rust port could mis-encode without a fixture. Add a citation or capture.
|
||||
- [RESOLVED] `[MAJOR]` Doc puts `NmxTransferEnvelope.reserved` as `i32` "preserved from observed; default 0" (design/10-raw-layer.md:78-79). The .NET encoder unconditionally writes 0 there (src/MxNativeCodec/NmxTransferEnvelope.cs:91) and `Parse` does not extract or expose the reserved bytes at all — there is no preservation. The doc's "round-trip preserver" promise (design/10-raw-layer.md:92) cannot be satisfied unless the Rust codec adds a field the .NET reference has thrown away. Either fix the .NET reference or drop the claim.
|
||||
- [RESOLVED] `[MAJOR]` Doc says SubscriptionStatus has `recordCount(i32) + operationId(GUID 16) + correlationId(GUID 16) + records[recordCount]` immediately after `cmd+version` (design/10-raw-layer.md:135-140). Source orders fields the same but reads `recordCount` at offset 3, `operationId` at 7, `correlationId` at 23, records at 39 (src/MxNativeCodec/NmxSubscriptionMessage.cs:54-55, 98-99). Total agrees, but the diagram's `+ correlationId` includes records-bearing offsets that the .NET parser splits into two distinct paths (DataUpdate has *no* correlationId; SubscriptionStatus does). Doc places correlationId in both records (line 135-140 union) — Rust port must not parse correlationId for `0x33`.
|
||||
- [RESOLVED] `[MAJOR]` Doc warns about `.NET ToLowerInvariant()` vs Rust `str::to_lowercase()` Unicode divergence and proposes a `_legacy` variant using "`unicase::Ascii::to_lowercase`" (design/10-raw-layer.md:66-68). `unicase::Ascii::to_lowercase` only handles ASCII — it is *not* a substitute for `ToLowerInvariant()` on non-ASCII Galaxy tag names. If the captured tags include non-ASCII (Turkish, German), the proposed legacy fallback will produce a *different* CRC than the .NET reference, not the same one. The mitigation as written makes the divergence worse.
|
||||
- [RESOLVED] `[MAJOR]` `MxReferenceHandle` Rust struct uses `pub` for every primitive field including fields recomputed from name signatures (design/10-raw-layer.md:28-41). The `original: Bytes` preservation pattern (design/10-raw-layer.md:226) cannot apply here because the struct is `Copy` with no buffer. A consumer mutating `object_signature` directly will desync from `compute_name_signature(tag_name)` — there is no invariant binding the two together. Either expose the handle as opaque with accessors, or document that signatures are caller-owned and the codec will not recompute.
|
||||
- [RESOLVED] `[NIT]` Doc names callback opnums `DataReceivedRaw` (3) and `StatusReceivedRaw` (4) (design/10-raw-layer.md:296). Source names them `DataReceived`/`StatusReceived` (src/MxNativeClient/NmxSvcCallbackMessages.cs:11-12, NmxProcedureMetadata.cs:89-101). The `Raw` suffix is doc-invented, will diverge from any cross-reference grep against the .NET reference.
|
||||
- [RESOLVED] `[NIT]` Doc claims the `ElapsedTime` wire kind `0x07` is part of the array set "0x41..0x46" decoded scalars (design/10-raw-layer.md:149). It is in the scalar set 0x01..0x07 — but `MxValueKind` does not enumerate `ElapsedTimeArray`, while `MxValue` also lacks one (design/10-raw-layer.md:200-215). If `0x47` ever appears in a capture, both .NET and the proposed Rust enums would silently drop it. Worth flagging as a known gap.
|
||||
|
||||
## design/20-async-layer.md
|
||||
|
||||
- [RESOLVED] `[BLOCKER]` `Session::write` claims a single `&str` reference is sufficient, but `MxNativeSession.WriteAsync` requires `writeIndex` and `clientToken` parameters that drive correlation of `OperationStatus` callbacks (src/MxNativeClient/MxNativeSession.cs:165-185). The Rust API at design/20-async-layer.md:32-49 has no clientToken, so the `await` cannot await a wire `WriteCompleted`; it can only confirm the LMX `Write` RPC return code. This contradicts the claim that "write" is a true async operation that completes when the wire confirms — section 310 even concedes the 5-byte completion frame is observed-only. The single-shot `await` model is misleading.
|
||||
- [RESOLVED] `[BLOCKER]` `write_secured` is offered unconditionally (design/20-async-layer.md:40-45) but the .NET reference explicitly throws `NotSupportedException` for it, citing `0x80004021` from captures 036/038/039 (MxNativeSession.cs:211-221). The Rust API surface promises a behaviour the proven stack does not deliver. Either drop it or rename to `write_secured2` to match `WriteSecured2Async`.
|
||||
- [RESOLVED] `[BLOCKER]` `read` is described as a one-shot `read(&str) -> DataChange` (design/20-async-layer.md:47-49) with no timeout argument. The .NET reference's `ReadAsync` is explicitly a timed advise/first-callback/unadvise dance and *requires* `TimeSpan timeout > 0` (MxNativeSession.cs:312-359). The Rust API has no semantic for the case where no callback ever arrives — `tokio::time::timeout` can drop the future, but on drop the Rust design does not say it issues `UnAdvise` (only `Subscription` drop does). This leaks an advise on every read timeout.
|
||||
- [RESOLVED] `[MAJOR]` `Subscription` is declared `Stream<Item = Result<DataChange, Error>>` (design/20-async-layer.md:70) without specifying whether `Err` is terminal. The .NET reference fans out *records* via `CallbackReceived` and routes parse errors to a separate `UnparsedCallbackReceived` event (MxNativeSession.cs:590-607). The Rust design conflates these. If `Err` terminates the stream, transient parse errors kill an otherwise healthy subscription; if `Err` is recoverable, downstream `while let Some(Ok(_))` consumers silently skip data. Pick one and document, or split into two streams matching .NET.
|
||||
- [RESOLVED] `[MAJOR]` Drop-cancellation of `Subscription` claims to "send `UnAdvise` (best-effort, fire-and-forget via `tokio::spawn`)" (design/20-async-layer.md:70). On runtime shutdown `tokio::spawn` from a `Drop` impl panics if no runtime is current, and during `Runtime::shutdown_timeout` spawned tasks are aborted before they can flush. The .NET reference disposes synchronously, sending `UnAdvise` per subscription on the same thread (MxNativeSession.cs:483-495). Document the runtime-shutdown path or provide an explicit `async fn close()`.
|
||||
- [RESOLVED] `[MAJOR]` `Session: Clone + Send + Sync` with shared `Arc<SessionInner>` (design/20-async-layer.md:27) but no explicit `close()` API. Last-clone-drop running `unregister_engine` "best-effort" requires either `tokio::spawn` (same shutdown hazard as above) or `block_on` (forbidden by section 305). The .NET reference is `IDisposable` synchronous and unregisters explicitly. The design has no answer for "I want to make sure the engine is unregistered before my process exits."
|
||||
- [RESOLVED] `[MAJOR]` Recovery is presented as automatic on heartbeat-loss (design/20-async-layer.md:114), but `MxNativeSession.RecoverConnection*` is **explicitly caller-driven** — the .NET API exposes `RecoverConnectionAsync(policy)` and never auto-starts (MxNativeSession.cs:383-440). Worse: during recovery, `_recoveryActive` is just a flag set on inbound callbacks; in-flight writes against `_service` are *not* paused or replayed. The Rust design's promise that "the future resumes on the new connection" is unbacked — port the .NET semantics (concurrent calls fail, caller decides) or capture the gap.
|
||||
- [RESOLVED] `[MAJOR]` `subscribe_buffered` (design/20-async-layer.md:87-100) returns `Stream<DataChangeBatch>` but the .NET equivalent `RegisterBufferedItemAsync` takes `itemDefinition`, `itemContext`, and crucially an `itemHandle: int` (MxNativeSession.cs:272-310). The Rust API drops the handle and the `(definition, context)` split, hiding the dual-string requirement. A consumer cannot reproduce the captured Frida bodies through this API.
|
||||
- [RESOLVED] `[MAJOR]` `subscribe_many(&["A.X","A.Y","A.Z"])` is described as multiplexing one callback channel and demultiplexing by correlation ID (design/20-async-layer.md:74-82). The .NET reference issues per-tag `AdviseSupervisory` with one `CorrelationId` each (MxNativeSession.cs:250-270); there is no atomicity. If the second `Advise` errors, the design does not specify whether the first is rolled back. A consumer expects either all-or-nothing or partial success surfaced — the doc says neither.
|
||||
- [RESOLVED] `[MAJOR]` `Transport` trait uses `#[async_trait]` (design/20-async-layer.md:168), forcing heap allocation per call and breaking the recently-stabilized native `async fn` in trait. If the project pivots to dyn-compatible native AFIT (Rust 1.75+ requires `dyn Transport` to use `Pin<Box<dyn Future>>` returning fns), the trait is **not dyn-safe** as written because `callbacks(&self) -> CallbackStream` returns a concrete struct — fine — but `async fn` methods are not dyn-safe without RPITIT workarounds. Pick `#[async_trait]` (legacy) or document that `Transport` is generic-only.
|
||||
- [RESOLVED] `[MAJOR]` `RecoveryEvent` enum (design/20-async-layer.md:122-127) is missing `WillRetry: bool` from `MxNativeRecoveryFailureEvent` (MxNativeSession.cs:47-51). Consumers cannot distinguish "this attempt failed but the policy will retry" from "terminal failure." Without it, downstream code cannot decide when to tear down its own state.
|
||||
- [RESOLVED] `[MAJOR]` `quality: u16` on `DataChange` (design/20-async-layer.md:222) but the .NET `MxStatus` model has its own categories (`MxStatusCategory`/`MxStatusSource`) and the codec already exposes `MxStatus`. Exposing a raw `u16` next to `status: MxStatus` invites callers to use the wrong field — the .NET reference uses `Record.ToDataChangeStatus()` (MxNativeSession.cs:70) as the canonical projection. Drop the `u16` or document the precedence.
|
||||
- [RESOLVED] `[MINOR]` `set_recovery_policy` takes `&mut session` in the sample (design/20-async-layer.md:288) but `Session` is `Clone` and `Arc`-backed (:27). Mutation through `&mut` on a clone is a foot-gun: clones won't see policy changes unless they're stored behind interior mutability. Either make it `&self` with `tokio::sync::watch` or document that it must be called before any clone is made.
|
||||
- [RESOLVED] `[MINOR]` `&'static str` in `Error::Unsupported` (design/20-async-layer.md:204) prevents formatting the offending operation with runtime context (e.g. capability name + transport variant). Use `Cow<'static, str>` or a structured variant.
|
||||
|
||||
## design/30-crate-topology.md
|
||||
|
||||
- [RESOLVED] `[BLOCKER]` `quick-xml` is the wrong dep entirely. NetTcpBinding default uses BINARY message encoder (.NET MC-NMF + MC-NBFX/NBFS dictionary tables), not XML over the wire. There is no XML envelope to parse; framing is binary records with dictionary string interning (design/30-crate-topology.md:130, :247). Evidence: src/MxAsbClient/MxAsbDataClient.cs:660–685 constructs `NetTcpBinding(SecurityMode.None)` with no override — default binding element is `BinaryMessageEncodingBindingElement`. The `mxaccess-asb-soap` crate name itself is a misnomer; needs MC-NMF/MC-NBFX framing, not SOAP/XML. `quick-xml` may still be needed for the small ASB control-plane XML payloads (`request.ToXml()` at AsbSystemAuthenticator.cs:79), but cannot frame net.tcp.
|
||||
- [RESOLVED] `[BLOCKER]` `mxaccess-asb-soap` is `publish = false` while `mxaccess-asb` (publishable) depends on it (design/30-crate-topology.md:128–138). Cargo refuses `cargo publish` on a crate whose path-dep lacks a published version. Either publish both, or fold the framing module into `mxaccess-asb`.
|
||||
- [RESOLVED] `[BLOCKER]` `rc4 = "0.1"` does not match crates.io. Latest is `rc4 v0.2.0` and the 0.1 line is unmaintained (design/30-crate-topology.md:245). Worse: `rc4` is published by RustCrypto but flagged stale; for NTLMv2 seal/sign most projects pull `cipher` + a manual ARC4 or use `ntlm-rs`/equivalent. Also `0.1` predates the `cipher` 0.4 trait reform that `aes 0.8`/`hmac 0.12` were built on.
|
||||
- [RESOLVED] `[MAJOR]` Pinned crypto versions form an inconsistent generation. `aes = "0.8"`, `hmac = "0.12"`, `md-5 = "0.10"`, `sha-1 = "0.10"`, `pbkdf2 = "0.12"` are the older `cipher 0.4`/`digest 0.10` line, but the design says "1.83+ stable" (design/30-crate-topology.md:241–250). Current crates pulled from index are `aes 0.9`, `hmac 0.13`, `md-5 0.11`, `pbkdf2 0.13` — all bumped to `digest 0.11`/`cipher 0.5` and require `rust-version: 1.85`. Mixing 0.10/0.12-line traits with 0.13-line will fail to resolve a coherent `digest`. Either pin the whole RustCrypto generation to one line, or bump MSRV to 1.85 to match.
|
||||
- [RESOLVED] `[MAJOR]` `sha-1 = "0.10"` is explicitly deprecated by upstream: "This crate is deprecated! Use the sha1 crate instead." (design/30-crate-topology.md:243). Pinning a deprecated crate in a fresh greenfield workspace is gratuitous.
|
||||
- [RESOLVED] `[MAJOR]` `windows = "0.58"` is significantly stale; current is `0.62.2` (design/30-crate-topology.md:253). Between 0.58 and 0.62 the COM, RPC and Cryptography modules saw breaking renames (e.g. `Win32_System_Rpc` surface trimmed, `Security_Cryptography` reorganised). Designing against 0.58 then "upgrading later" wastes work. Pin to 0.62.x.
|
||||
- [RESOLVED] `[MAJOR]` `tiberius = "0.12"` + `auth-windows` claim. `tiberius` 0.12.3 default features include `winauth` (SSPI) only on Windows; integrated security against MSSQL via SSPI is supported, but the design lists `mxaccess-galaxy` as "All Rust targets (TDS works cross-platform)" while only providing `auth-windows` for integrated security (design/30-crate-topology.md:94–96, :203). On Linux, the only auth options are SQL logins or `integrated-auth-gssapi` (Kerberos). Galaxy databases in practice are domain-joined Windows boxes using NTLM/Kerberos integrated auth — Linux clients won't work without `integrated-auth-gssapi` and a configured KDC. The doc claims cross-platform without flagging this.
|
||||
- [RESOLVED] `[MAJOR]` `num-bigint = "0.4"` for DH ModPow is not constant-time (design/30-crate-topology.md:251). The .NET reference uses `BigInteger.ModPow` which is also not constant-time, but the Rust port has the chance to use a constant-time bignum (`crypto-bigint`). Since the DH exponent is the long-lived private key (AsbSystemAuthenticator.cs:153–166), a side-channel-leaky `mod_exp` re-creates a defect, not parity. Flag as a security regression vs. an opportunity.
|
||||
- [RESOLVED] `[MAJOR]` Crate boundary: `mxaccess-galaxy` is split out, but the .NET reference keeps `GalaxyRepositoryTagResolver.cs` inside `MxNativeClient` namespace (`namespace MxNativeClient;` at line 4) (design/30-crate-topology.md:117–122). Splitting into a separate crate forces `mxaccess-nmx` → `mxaccess-galaxy`, but the resolver returns `MxReferenceHandle` (a `mxaccess-codec` type) and is consumed by NMX register flows — fine, no cycle. However, `mxaccess-callback` does NOT need `mxaccess-galaxy`, yet the diagram routes everything through `mxaccess-nmx` which depends on both. Minor coupling concern: the resolver pulls `tiberius` (heavy, native-tls/rustls/winauth) into every consumer of `mxaccess-nmx`. Should be a feature-gated optional dep.
|
||||
- [RESOLVED] `[MAJOR]` Feature `dpapi` default-on for Windows under `mxaccess-asb` — but the design does not mention `#[cfg(feature = "dpapi")]` boundaries inside `mxaccess-asb` (design/30-crate-topology.md:139, :202). The ASB shared secret is mandatory for the DH passphrase derivation (AsbSystemAuthenticator.cs:28, :134–142). With `dpapi=off` and no alternate secret source spec'd, the crate cannot authenticate at all. Either remove `dpapi` as optional, or define an explicit `SecretProvider` trait that DPAPI plugs into.
|
||||
- [RESOLVED] `[MAJOR]` `clippy::unwrap_used = deny` interaction with the error model (design/30-crate-topology.md:192, design/50-error-model.md:30). `Arc<str>` construction from `&str` via `Arc::<str>::from(&*s)` is fine, but `TypeMismatch { reference: Arc<str>, ... }` formatted via `thiserror` `#[error("... {reference} ...")]` requires `Arc<str>: Display`, which it is — no unwrap path. Real risk: any place the codec parses a UTF-16LE name and calls `String::from_utf16(...).unwrap()` will trip the lint. The design needs an explicit "fallible UTF-16 decode helper" rule. Worth flagging because `MxReferenceHandle` parsing is core.
|
||||
- [RESOLVED] `[MAJOR]` MSRV 1.83 vs. dependency versions. `uuid v1` features `["v4", "v7"]` — `uuid 1.23.1` requires `rust-version: 1.85.0` (design/30-crate-topology.md:188, :228, :233). Same for current `tiberius` indirect deps. Pinning MSRV to 1.83 while pulling `uuid = "1"` (= latest) is contradictory. Either pin `uuid = "=1.10"` (last 1.83-compatible) or raise MSRV to 1.85.
|
||||
- [RESOLVED] `[MINOR]` "Pinned to a recent stable Rust via `rust-toolchain.toml`. MSRV equals the pinned version (no separately-stated MSRV — pinning is the contract)" contradicts `rust-version = "1.83"` in the workspace skeleton (design/30-crate-topology.md:188, :228). Pick one policy.
|
||||
- [RESOLVED] `[MINOR]` Build commands include `cargo run --example connect-write-read` but the workspace example-target only resolves at the workspace root if a single crate owns examples (design/30-crate-topology.md:20–27, :181–183). With nine crates and `examples/` at the workspace root (line 20), `cargo run --example` will fail unless examples live inside a specific crate (typically `mxaccess`).
|
||||
- [RESOLVED] `[MINOR]` License "MIT OR Apache-2.0 — to be confirmed" (design/30-crate-topology.md:226, :269). `tiberius` is `MIT/Apache-2.0`; everything else verified is MIT-or-Apache-2.0 compatible. No GPL/AGPL contamination found in the pinned set. Risk is `windows-rs` proxy/stub IDL re-emissions if any are vendored from Microsoft headers — flag for legal review when the codec ports OBJREF/OXID structs derived from MIDL output.
|
||||
- [RESOLVED] `[NIT]` "Edition 2021 initially. Edition 2024 once stable across the dependency graph" (design/30-crate-topology.md:189). Edition 2024 has been stable since Rust 1.85 (2025-02). Given pinned deps already require 1.85, just go edition 2024.
|
||||
|
||||
## design/40-protocol-invariants.md
|
||||
|
||||
- [RESOLVED] `[BLOCKER]` Reference registration request prefix is severely under-documented. Doc shows prefix `cmd(1) + version(2) + itemHandle(i32) + correlation(GUID 16) + (-1 i16) + reservedByte + (1 i32) + itemDefinition…` (design/40-protocol-invariants.md:172-180). Source `NmxReferenceRegistrationMessage.cs:15` defines `HeaderLength = 55`, with explicit writes only at offsets 0,1,3,7,23,27 (src/MxNativeCodec/NmxReferenceRegistrationMessage.cs:80-87). Bytes 25-26 and 31-55 (≈26 bytes) are zero-initialized but never described. CLAUDE.md says "preserve unknown bytes"; the doc elides them. Rust port reading the spec will encode a 30-byte prefix instead of 55.
|
||||
- [RESOLVED] `[BLOCKER]` Write body common prefix is wrong. Doc (design/40-protocol-invariants.md:108) says "cmd(1) + version(2) + padding(2) + handle_projection(14)" and locates wireKind at offset 17. Source `NmxWriteMessage.cs:11-13` has `HandleProjectionOffset = 3`, `HandleProjectionLength = 14`, `KindOffset = 17` — so layout is `cmd(1) + version(2) + handle_projection(14)`, no 2-byte padding. The 14 bytes are at offsets 3..17, leaving wireKind at 17. Doc inserts a phantom 2-byte padding.
|
||||
- [RESOLVED] `[BLOCKER]` String/DateTime write total size is wrong. Doc (design/40-protocol-invariants.md:118-119) says total = "26 + N". `NmxWriteMessage.cs:150` computes `KindOffset(17) + 1 + 4 + 4 + N + 14 + 4 = 44 + N`. The 26+N figure is off by 18 bytes. Anyone implementing to spec will under-allocate.
|
||||
- [RESOLVED] `[BLOCKER]` Subscription array header offset width disagrees with the write encoder. Doc (design/40-protocol-invariants.md:120) says array layout is `count(u16) + width(u16) + 2N + suffix`. Encoder agrees: `NmxWriteMessage.cs:181-182` writes count u16 at body[22], elementWidth u16 at body[24]. But the **decoder** at `NmxSubscriptionMessage.cs:264-265` reads `count` as u16 at body+4 and `elementWidth` as **i32** at body+6. Either the encoder or decoder is wrong, and the doc only captures one shape. This is a load-bearing inconsistency that the BoM doc must resolve, not paper over.
|
||||
- [RESOLVED] `[BLOCKER]` Boolean write value section description is wrong. Doc (design/40-protocol-invariants.md:114) says "1 byte (0xFF/0x00) + 3 reserved bytes". `NmxWriteMessage.cs:257` encodes `[0xff, 0xff, 0xff, 0x00]` (true) or `[0x00, 0xff, 0xff, 0x00]` (false) — bytes 1 and 2 are 0xFF, not "reserved". Reserved implies don't-care; native sets them to 0xFF. CLAUDE.md mandates preserving unknown bytes; calling them "reserved" invites zeroing.
|
||||
- [RESOLVED] `[BLOCKER]` AdviseSupervisory and Advise share opcode 0x1f but the parser rejects plain Advise. Doc (design/40-protocol-invariants.md:97-99) lists three commands: `Advise (plain) 0x1f / 37 bytes`, `AdviseSupervisory 0x1f / 39 bytes`, `UnAdvise 0x21 / 37 bytes`. But `NmxItemControlMessage.cs:46-49` rejects `command is not (AdviseSupervisory or UnAdvise)` — a 37-byte 0x1f message fails to parse. Either the doc must remove "plain Advise" or call out that the codec emits/accepts only the supervisory form (39 bytes). The duplicate enum entries `Advise = 0x1f, AdviseSupervisory = 0x1f` in the source signal this is unresolved (`NmxItemControlMessage.cs:7-8`).
|
||||
- [RESOLVED] `[MAJOR]` `recordCount != 1` invariant for DataUpdate is missing. `NmxSubscriptionMessage.cs:71-74` hard-throws if `recordCount != 1` for the 0x33 DataUpdate frame. The doc treats `recordCount(i32, typically 1)` as a casual hint (design/40-protocol-invariants.md:161) and only mentions multi-record as "not yet wire-proven" (design/40-protocol-invariants.md:303). For a bill-of-materials this is an enforced invariant, not a soft expectation. The Rust port must replicate the throw to match parity.
|
||||
- [RESOLVED] `[MAJOR]` HRESULT 0x8007139F meaning conflicts across docs and elides the canonical name. Doc (design/40-protocol-invariants.md:272) says "Uninitialized object". `design/50-error-model.md:133` maps it to `EngineNotRegistered`. `docs/Capture-Run-2026-04-25.md:888` and `docs/MXAccess-Public-API.md:326` document it as **`ERROR_INVALID_STATE`**, observed from `ProcessActivateSuspend2`. The 40-doc cites neither file and gives a folkloric description.
|
||||
- [RESOLVED] `[MAJOR]` Write2 timestamped suffix description is incoherent. Doc (design/40-protocol-invariants.md:131-132): "8-byte FILETIME inserted between offsets 12 and 19 of the suffix region". `NmxWriteMessage.cs:240-251` `WriteTimestampedSuffix` actually replaces the 8-byte filler at suffix offsets 2..10 with the FILETIME (and changes the leading i16 from `-1` to `0`). The "between 12 and 19" wording does not match any offset in the source.
|
||||
- [RESOLVED] `[MAJOR]` `tail` value for item-control is not specified. `NmxItemControlMessage.cs:88` defaults `tail = 3` for advise/unadvise. The doc only describes shape (`tail(4)`) and never names the constant or its value (design/40-protocol-invariants.md:102). The Rust port will pick a different value and fail at the responding NMX. This is the kind of byte-exactness this BoM is supposed to nail down.
|
||||
- [RESOLVED] `[MAJOR]` Envelope ProtocolMarker described as `0x0201` int32 — but bytes 38-41 hold the LE u32. Doc (design/40-protocol-invariants.md:67) calls offset 38..42 a single i32 = `0x0201`. `NmxTransferEnvelope.cs:99` writes `ProtocolMarker = 0x0201` as Int32LE at offset 38. So bytes are `01 02 00 00`. The doc's claim is technically OK but the BoM should call out the wire-byte sequence to prevent endianness mistakes, since the value visually reads as a high byte 0x02. Minor relative to others but worth noting.
|
||||
- [RESOLVED] `[MAJOR]` Reserved offset 6 in the envelope is described as "preserved from observed (default 0)" but `Parse` does not retain it. Doc (design/40-protocol-invariants.md:59) promises round-trip, but `NmxTransferEnvelope.cs:39-75` reads only Version, InnerLength, ProtocolMarker, etc., and discards bytes 6..10 entirely. The Encode path always writes 0 (line 91). This violates CLAUDE.md "preserve unknown bytes" and the doc misrepresents the implementation.
|
||||
- [RESOLVED] `[MAJOR]` DCE/RPC bind UUID and active-interface UUID lack file:line citations. Two of the most load-bearing values for activation (design/40-protocol-invariants.md:13-14) cite "`docs/Loopback-Protocol-Findings.md`" with no line number, while every other COM identifier in the same table has a `:N` citation. For a BoM this is a missing receipt.
|
||||
- [RESOLVED] `[MAJOR]` CRC-16 attribute signature initial value is "`0`" but the doc never references the spot in `MxReferenceHandle.cs` that proves `0` (vs e.g. `0xFFFF`). Doc (design/40-protocol-invariants.md:90) asserts initial value 0. `MxReferenceHandle.cs:51` does start with `ushort crc = 0`, so the claim is correct but the byte order step (low byte then high byte of UTF-16LE per char) is shown at lines 53-56 — these need explicit `:LINE` anchors, not just a range. Lines 108-119 are the inner CRC step, not the per-char loop.
|
||||
- [RESOLVED] `[MAJOR]` Status-detail table omits Source field and the `Detail` is `short` not byte. Doc (design/40-protocol-invariants.md:279-291) lists detail codes as bare integers. `MxStatus.cs:32` declares `Detail` as `short` (signed 16-bit). The two callbacks (DataUpdate quality `u16`, status records `i32`) use different widths (`NmxSubscriptionMessage.cs:126,132,136`). Whether the wire is i32 (record status) or short (`MxStatus.Detail`) matters for sign extension on detail 8017. Doc collapses both.
|
||||
- [RESOLVED] `[MINOR]` DataUpdate record layout in doc claims `quality(u16) + timestamp_filetime(i64) + wireKind(u8) + value(N)` after status — confirmed by `NmxSubscriptionMessage.cs:126-143` for `hasDetailStatus=false`. SubscriptionStatus claims `status(i32) + detailStatus(i32) + quality(u16) + timestamp + wireKind + value`, also confirmed. But DataUpdate header is `cmd(1) + version(2) + recordCount(4) + operationId(16) = 23 bytes` then records. SubscriptionStatus header is the same 23 bytes **plus** a per-message correlationId(16) at offset 23, records start at 39 (`NmxSubscriptionMessage.cs:98-99`). Doc shows the correlationId for SubscriptionStatus only on line 153, which is correct, but the BoM table lacks an explicit "header length 23 vs 39" which would help an implementer. NIT level since the byte math is correct.
|
||||
- [RESOLVED] `[MINOR]` Boolean array on the wire uses i16 elements (`-1`/`0`), not bool bytes. `NmxWriteMessage.cs:307` writes `(short)-1` for true. `NmxSubscriptionMessage.cs:282` decodes width=`sizeof(short)`. Doc (design/40-protocol-invariants.md:120) shows `2N` for BoolArray which is consistent but doesn't say the encoding is `0xFFFF`/`0x0000` two's complement. Worth noting.
|
||||
- [RESOLVED] `[MINOR]` `RegisterEngine` opnum 3 has the same signature as `INmxService::RegisterEngine` — but the **`INmxService2` interface re-declares `new`** versions of all base methods (`NmxComContracts.cs:55-73`). This means COM proxy/stub for `INmxService2` exposes its own opnum table starting at 3, not inheriting opnums from `INmxService`. Doc (design/40-protocol-invariants.md:19) says "Opnums are sequential across the inheritance" — strictly speaking, with `new` declarations the C# vtable has its own slots. This needs a sentence saying "in the IDL these are sequential because `INmxService2 : INmxService`, but in the .NET interop they are re-declared `new`". Otherwise the Rust port may misinterpret what is bound.
|
||||
- [RESOLVED] `[NIT]` Reference Result message tail described as "tail(16 zero)" (design/40-protocol-invariants.md:191). `NmxReferenceRegistrationResultMessage.cs` not read but per CLAUDE.md should preserve verbatim — doc claim "all zero" needs a Frida-capture citation, not assertion.
|
||||
|
||||
## design/50-error-model.md
|
||||
|
||||
- [RESOLVED] `[BLOCKER]` `0x80004021` mapping is fabricated for "regular operations" (design/50-error-model.md:130). The .NET reference NEVER emits `0x80004021` from a non-secured path — it only appears in `MxNativeSession.WriteSecuredAsync` as the explanation for the `NotSupportedException` thrown locally before any wire call (src/MxNativeClient/MxNativeSession.cs:219-220). The "regular operation" stale-handle code observed across all captures is `0x80070057` E_INVALIDARG (e.g. captures/probe-add-remove.log:7, captures/008-write-test-int-same-value/harness.log:7, plus the `LmxProxy.dll` decompile returns `0x80070057` for stale handles at analysis/ghidra/exports/LmxProxy.dll.item-helper-decompile.md:60,75,88,164). The `StaleItem` arm of the split is unsupported by evidence; the design even contradicts itself one row below by mapping `0x80070057` to `InvalidArgument`. Pick one.
|
||||
- [RESOLVED] `[BLOCKER]` `WriteSecuredForbidden` cannot ever be returned by the Rust port as designed (design/50-error-model.md:101,129). The .NET reference does not surface `0x80004021` from a wire response — `WriteSecuredAsync` never reaches the wire, it `throw new NotSupportedException(...)` (src/MxNativeClient/MxNativeSession.cs:218-221). For Rust parity this should map to `Error::Unsupported(...)` at the API boundary, not a runtime HRESULT translation. The `SecurityError::AccessDenied { detail }` path (design/50-error-model.md:115) is also unreachable through the proven stack — captures 111/112 show the same `0x80004021` even after auth changes (captures/112-frida-write-secured-auth-verified-protectedvalue1/frida-events.tsv:69).
|
||||
- [RESOLVED] `[MAJOR]` `0x800706BA` is not transient in the .NET reference — `is_retryable=true` is wrong (design/50-error-model.md:135,200). Every observed instance is a structural callback-marshalling failure, not a flapping NmxSvc: analysis/proxy/managed-registerengine2-callback-probe.txt:8, …-loopback-probe.txt:8, …-fixed-port-probe.txt:8 and docs/NMX-COM-Contracts.md:592 all describe it as the "no SYN to advertised port" outcome after security bindings are added — a config bug, not a retryable transient. Retrying loops forever.
|
||||
- [RESOLVED] `[MAJOR]` `is_retryable()` for `Connection(TcpConnect)` (design/50-error-model.md:188) glosses over `io::ErrorKind::AddrNotAvailable` / `HostUnreachable` / `ConnectionRefused` vs `TimedOut`. Retrying a name-not-found immediately is a hot-loop bug. `ConnectionError::TcpConnect` carries the `io::Error` (design/50-error-model.md:59) — `is_retryable` must inspect `kind()` like `Io(_)` does on line 206, not blanket-yes.
|
||||
- [RESOLVED] `[MAJOR]` `MxStatusCategory` and `MxStatusSource` are leaked into `Error::Status` without `#[non_exhaustive]` (design/50-error-model.md:45). They are plain Rust enums in the design but mirror short-valued .NET enums (src/MxNativeCodec/MxStatus.cs:3,17) where AVEVA could legally introduce new categories — `Unknown=-1` already implies open-set. Without `#[non_exhaustive]`, downstream `match` statements lock the API. Note also `MxStatusSource` exposes six values (`RequestingLmx`…`RespondingAutomationObject`) — the Rust port should mirror these names exactly; the design never lists them.
|
||||
- [RESOLVED] `[MAJOR]` `Error::Status { detail: i16 }` drops `MxStatus.Success` (src/MxNativeCodec/MxStatus.cs:29). The .NET `MxStatus` is a 4-tuple `(Success, Category, DetectedBy, Detail)`, and `Success=-1` is the documented "OK" sentinel (MxStatus.cs:36-58). The Rust error model loses the Success short, breaking byte-parity diagnostics demanded by CLAUDE.md ("Preserve unknown bytes").
|
||||
- [RESOLVED] `[MAJOR]` `ConfigurationError` category is non-retryable per the recovery table (design/50-error-model.md:201), but detail=21 ("Invalid reference", src/MxNativeCodec/MxStatus.cs:76) is a cold-cache miss that becomes valid after Galaxy resolver refresh — the .NET path retries by re-resolving (`MxNativeSession.ResolveTagAsync` is called every operation, src/MxNativeClient/MxNativeSession.cs:173,196,232,255). Mapping the entire `ConfigurationError` category to `is_retryable=false` is too coarse.
|
||||
- [RESOLVED] `[MAJOR]` `MxValueKind` `Debug` derive is not guaranteed by the codec (design/50-error-model.md:30). The .NET enum `MxValueKind` (src/MxNativeCodec/MxValueKind.cs:3-18) is the spec; the Rust port must explicitly `#[derive(Debug)]` and the design's `expected:?, actual:?` format (line 29) requires it. Not a blocker if obeyed, but the design must state it — neither `10-raw-layer.md` nor `50-error-model.md` constrain the codec to derive `Debug`.
|
||||
- [RESOLVED] `[MAJOR]` `RpcError` is exposed in the public `ConnectionError::OxidResolve { source: RpcError }` (design/50-error-model.md:61) but `RpcError` is defined only in the raw layer (design/10-raw-layer.md:282-285) with no documented `std::error::Error` impl, no `Display`, and no `#[non_exhaustive]` declaration cited. `#[source]` requires `std::error::Error`. The design must promote `RpcError` to a public, stable, `Error`-implementing type or wrap it.
|
||||
- [RESOLVED] `[MINOR]` Cancellation policy on `UnAdvise` failure is unspecified (design/50-error-model.md:145). "Best-effort" with no statement about whether the failure is logged, traced, or silently dropped. The .NET reference logs to `mx.unadvise.error` (captures/probe-add-remove.log:7) — the Rust port should commit to `tracing::warn!` and say so.
|
||||
- [RESOLVED] `[MINOR]` Panic policy is incomplete (design/50-error-model.md:154-159). `clippy::panic = deny` does not cover `unreachable!()`, `todo!()`, indexing panics (`a[i]`), arithmetic overflow panics, or slice bounds. Add `clippy::indexing_slicing`, `clippy::unreachable`, `clippy::todo`, `clippy::arithmetic_side_effects` (or document why omitted). Test override "via `#[cfg(test)]`" is vague — `#![cfg_attr(test, allow(clippy::unwrap_used))]` is the actual pattern.
|
||||
- [RESOLVED] `[NIT]` `0x8001011D` `CallbackObjRefRejected` mapping is supported but the cite is thin: it appears only in probe-narrative docs (docs/NMX-COM-Contracts.md:591), not as a logged HRESULT in `captures/`. The design should cite docs/NMX-COM-Contracts.md:590-594 directly so future maintainers can find it.
|
||||
|
||||
## design/60-roadmap.md
|
||||
|
||||
- [RESOLVED] `[BLOCKER]` M1 DoD claims "every Frida-captured write/advise/subscribe body in `captures/0NN-frida-*` round-trips byte-identical" but several captures contain unresolved native behaviour the .NET reference itself does not yet decode (design/60-roadmap.md:35). Evidence: captures/079-082, 094 are buffered-advise scenarios where work_remain.md:177-181 says the codec layout for buffered batches is unproven (provider only emits single-sample batches), and 70-risks-and-open-questions.md:21-26 (R2) explicitly defers buffered batch parity. 036 (single-token `WriteSecured`) returns `0x80004021` with no payload sent (R6, line 53-58). 077-078 (suspend/activate) trigger conditions are unknown (R5). Round-trip-against-themselves is achievable, but "byte-identical to the .NET reference's encode" is not, because the reference does not encode these flows.
|
||||
- [RESOLVED] `[BLOCKER]` M5 DoD requires the ASB type matrix to cover `Boolean, Int32, Float, Double, String, DateTime, Duration, and the corresponding arrays` "matching `work_remain.md:108-113`" — but work_remain.md:109 only proves "deployed array tags," not all eight scalar arrays (design/60-roadmap.md:71). The DoD over-states what the .NET reference has actually proven; less-common ASB types are explicitly deferred ("Remaining work is adding less common ASB types only as needed"). Citing the line in `work_remain.md` as authority for a stronger claim than the file makes is a dependency on unproven behaviour.
|
||||
- [RESOLVED] `[BLOCKER]` M6 DoD "`cargo bench` shows codec encode/decode latency comparable to or faster than the .NET reference" directly contradicts the explicit V1 non-goal "`cargo bench` numbers as gating criteria. We measure but don't gate beyond M6's loose acceptance bar." (design/60-roadmap.md:82 vs 60-roadmap.md:149). Also, "comparable to or faster than" has no defined comparison harness — the .NET reference has no microbench project. The DoD is unmeasurable as written. Same milestone also asserts "live subscribe under churn does not allocate per-message" with no allocation-counting tooling specified (compare to 70-risks-and-open-questions.md:102-108 R12 which only aims for "< 5 allocations per write at steady state").
|
||||
- [RESOLVED] `[MAJOR]` M2 DoD "non-zero `partnerVersion` against a running AVEVA install" is boolean-trivial and re-proves what docs/DotNet10-Native-Library-Plan.md:64-73 already established (`partner_version=6`) (design/60-roadmap.md:43). It does not exercise `RegisterEngine2`, callback OBJREF export, or actually receiving a status frame, despite the prose two lines above (line 41) listing those as in-scope. The DoD is weaker than what M2 promises to deliver and cannot detect regressions in the callback exporter — which is the *hardest* part of M2 per the .NET evidence (`MxNativeClient` had to hand-roll `INmxSvcCallback`/`IRemUnknown`).
|
||||
- [RESOLVED] `[MAJOR]` Sequencing claim "M5 can run in parallel with M3/M4" is partially false (design/60-roadmap.md:142,145). M5 says `mxaccess::Session over AsbTransport` (line 68) — the `Session` type, recovery policy, `Stream<Item = DataChange>`, and `Transport` trait are all defined in M4 (lines 55-60). M5 cannot land its public surface before M4 lands the Session abstraction. The "ASB does not depend on DCE/RPC" justification only covers the *transport*, not the shared `Session`. This contradicts 30-crate-topology.md:64-65 where `mxaccess` (top-level) sits below both transports.
|
||||
- [RESOLVED] `[MAJOR]` Cross-implementation parity assumes the .NET reference is correct, but work_remain.md:170-174 flags the completion-only byte mapping (`0x00`/`0x41`/`0xEF`) as unmapped and `MXSTATUS_PROXY[]` conversion as missing (design/60-roadmap.md:96-102). Any Rust port that "matches the .NET reference" inherits the same gap — passing parity tests does not mean correct behaviour. The roadmap should mark these as "preserved verbatim" rather than green-checked by parity. R3/R4 acknowledge this but the validation strategy section does not.
|
||||
- [RESOLVED] `[MAJOR]` Cross-platform drift: roadmap line 153 says "Linux is a stretch goal sitting behind feature flags" but 30-crate-topology.md:91 claims `mxaccess-galaxy` is "All Rust targets (TDS works cross-platform via `tiberius`)" and the codec is also "All Rust targets" (line 82), and 70-risks-and-open-questions.md:128-134 Q3 says "Linux-best-effort." Three documents, three different positions. The roadmap is the most restrictive; either it is wrong or the topology over-promises.
|
||||
- [RESOLVED] `[MAJOR]` Live-probe CI story is unspecified. M2/M3/M4/M5 DoDs all require a "running AVEVA install" + Galaxy DB + `MX_LIVE` env (60-roadmap.md:43,51,62,71,93-94). There is no description of how this CI lane is provisioned — AVEVA System Platform is a licensed Windows-only install with a SQL Galaxy. Without a hosted runner story, every milestone's DoD reduces to "the author ran it once on their workstation." This is not a regression-detecting test line.
|
||||
- [RESOLVED] `[MINOR]` M3 DoD cites `captures/022-frida-int-write*` and `captures/077-frida-advise*` for byte-identical comparison (60-roadmap.md:51), but `077` is `077-frida-suspend-advised-scanstate` (per the directory listing) — a *suspend* capture, not an advise capture. The advise/subscribe captures are `058-060`, `077` actually exercises the unproven Suspend path (R5). Citation appears wrong or aspirational.
|
||||
- [RESOLVED] `[MINOR]` M0 DoD includes `cargo publish --dry-run -p mxaccess-codec` (60-roadmap.md:16) but 30-crate-topology.md:269 and Q2 (70-risks-and-open-questions.md:120-126) flag that the license is unconfirmed and there is no LICENSE in the repo. `cargo publish --dry-run` will refuse without `license` + `license-file` resolved. M0 cannot complete cleanly until Q2 is settled, but the dependency table does not reference Q2.
|
||||
- [RESOLVED] `[NIT]` `tests/fixtures/` populated from `captures/` via "junction or copy" (60-roadmap.md:12, 30-crate-topology.md:29) — junctions on Windows do not survive `git clone` on Linux/macOS, breaking the cross-platform stretch goal at the fixture-loading layer. Pick one.
|
||||
|
||||
## design/70-risks-and-open-questions.md
|
||||
|
||||
- [RESOLVED] `[BLOCKER]` `mxaccess-asb-soap` framing risk is mis-scoped — the .NET reference uses `NetTcpBinding` with default **BinaryMessageEncoder** (.NET Binary XML / NBFX / [MC-NBFX]+[MC-NBFS]), NOT SOAP/XML. R1 talks about hand-rolling [MS-NMF] (~1500 LoC) but never lists implementing the binary dictionary/string-table codec, and 30-crate-topology.md:130 pins `quick-xml` for an XML parse path that does not exist on the wire. The risk register should record "no XML on the wire — must implement [MC-NBFX] decoder; `quick-xml` is the wrong dep" as its own R-item (design/70-risks-and-open-questions.md:7-18). Evidence: src/MxAsbClient/MxAsbDataClient.cs:663 `new NetTcpBinding(SecurityMode.None)` with no message-encoding override → WCF defaults to binary; design/30-crate-topology.md:130 lists `quick-xml`. Currently missing.
|
||||
- [RESOLVED] `[BLOCKER]` `recordCount != 1 → throw` invariant has no risk entry. src/MxNativeCodec/NmxSubscriptionMessage.cs:71-74 hard-rejects any DataUpdate with more than one record. R2 is about the *missing fixture*, but the much bigger risk — that the codec will **panic in production** the first time AVEVA emits a multi-record `0x33` — is unrecorded. Either lift the constraint defensively or add this as its own R-item with a "behavior on rejection" plan (design/70-risks-and-open-questions.md:20-26). Evidence: src/MxNativeCodec/NmxSubscriptionMessage.cs:71-74; design/40-protocol-invariants.md:303 acknowledges the gap but only as a fixture problem.
|
||||
- [RESOLVED] `[BLOCKER]` `0x80004021 → StaleItem` mapping in 50-error-model.md:130 is unevidenced. R6 admits "the reason is unknown" for secured writes, then the error model casually maps the same HRESULT on regular operations to a brand-new `StaleItem` enum that no capture, decompile, or .NET reference produces. This is invented protocol behavior — directly violates CLAUDE.md "do not fabricate protocol behavior" and is not flagged (design/70-risks-and-open-questions.md:52-58). Evidence: design/50-error-model.md:130 — `StaleItem` synthesised; no fixture cited; design/40-protocol-invariants.md:270 only documents the secured-write path.
|
||||
- [RESOLVED] `[MAJOR]` R3/R4 "Settles when" depends on Ghidra output `analysis/ghidra/aaDCT tables` that the doc itself admits is "currently absent." work_remain.md:173-174 reports "Current available decompiled/Ghidra outputs did not expose a mapping table for completion-only bytes." There is no plan, owner, or alternate path (e.g. native callback frida capture mentioned in work_remain.md:170). This is indefinitely punted — should be relabeled "indefinitely deferred" like the OperationComplete row in the evidence-gaps table, and stripped of fake settle criteria (design/70-risks-and-open-questions.md:34, 42).
|
||||
- [RESOLVED] `[MAJOR]` No risk for `Drop`-time async cleanup hazards. Q6 covers `Clone` semantics but the design at 20-async-layer.md:70 and 50-error-model.md:145-146 openly admits drop fires `tokio::spawn` for `UnAdvise`/`UnregisterEngine` — that requires a Tokio runtime to be alive at drop time. Drop in a sync context (e.g. final teardown after `Runtime::shutdown_timeout`) will silently leak the unadvise/unregister and leak server-side handles in NMX. This is a correctness hazard and not in R1–R12 (design/70-risks-and-open-questions.md:152-158). Evidence: design/00-overview.md:38 "No spawn from inside Drop" directly contradicts design/20-async-layer.md:70 and 50-error-model.md:145.
|
||||
- [RESOLVED] `[MAJOR]` Crypto/auth crate version-drift risk is missing. 30-crate-topology.md:130 pins `rc4`, `sha-1`, `md-5`, `num-bigint` — all of which are at minimum-maintenance / deprecation watchlist (`rc4` is yanked-adjacent, `sha-1` v0.10 is the last RustCrypto release with a deprecation warning). Combined with 10-raw-layer.md:252 "Do not pull `ring` — hand-roll MD4," the auth surface area depends on ~5 marginal crates. No R-item tracks "what happens when one of these is yanked / advisory'd in CI." Currently missing.
|
||||
- [RESOLVED] `[MAJOR]` R1 and R12 have inverted severity. R1 is "hand-roll [MS-NMF] reliable-session, NBFX/NBFS dictionary codec, DH key agreement (~1500 LoC, plus the entire WCF binary message encoder you forgot)" — that's the entire ASB data plane. R12 is `BytesMut::with_capacity` micro-optimization. Treating them as peer entries in an unsorted list misrepresents the project's blocker surface. The doc has no severity tier; introduce one or reorder (design/70-risks-and-open-questions.md:7, 102).
|
||||
- [RESOLVED] `[MAJOR]` Q1 says "sibling `rust/`" but no workspace exists. `Glob` of `rust/` confirms zero files; CLAUDE.md itself hedges ("when it is started"). The "best answer" is presented as confirmed but is in fact still a question — and M0 cannot start without it. Either escalate to an explicit decision item with an owner, or stop calling it answered (design/70-risks-and-open-questions.md:112-118).
|
||||
- [RESOLVED] `[MINOR]` Q3 contradicts crate topology. Q3 says "Linux-best-effort … macOS unsupported in V1." 00-overview.md:35 describes "Windows-first, **cross-platform-aware**" with ASB SOAP framing portable. 30-crate-topology.md:130 lists no `cfg(windows)` gating on `mxaccess-asb-soap` deps. So either the soap crate compiles on Linux (contradicting Q3 best-effort) or it doesn't (contradicting the topology). Resolve in one place (design/70-risks-and-open-questions.md:128-134).
|
||||
- [RESOLVED] `[MINOR]` "No risk: NDR-bridge" and "No risk: custom TLB" are asserted with docs/Transport-Correlation.md and "the .NET reference" as citations but no `:line` numbers and no Ghidra/capture artifact. CLAUDE.md mandates evidence; the section that proudly declares non-issues is the only section in the file with weaker citations than the risks themselves (design/70-risks-and-open-questions.md:175-187).
|
||||
- [RESOLVED] `[MINOR]` Evidence-gaps table omits two fixtures present in work_remain.md: (a) source-level no-communication evidence (work_remain.md:198) and (b) live partial-cleanup behavior after channel failure (work_remain.md:196-197). Both are open in work_remain.md but missing from the gap table (design/70-risks-and-open-questions.md:164-171).
|
||||
|
||||
Reference in New Issue
Block a user