Files
mxaccess/design/60-roadmap.md
T
Joseph Doherty fe2a6db786
rust / build / test / clippy / fmt (push) Has been cancelled
Initial project state: .NET reference, design, Rust port (M0+M1), evidence
Layout:
- src/                    .NET 10 x64 reference: MxNativeCodec, MxNativeClient,
                          MxAsbClient, probes, tests, harnesses. Executable spec.
- design/                 Architectural plan for the Rust port (M0–M6), error
                          model, protocol invariants, risks (R1–R16), adversarial
                          review log (review.md).
- rust/                   Rust workspace. M0 skeleton + M1 codec parity.
                          mxaccess-codec: 215 unit tests + 2 cross-implementation
                          parity tests (byte-identical against .NET reference).
                          Other crates are M0 stubs awaiting M2+.
- captures/               Frida + netsh + pcap evidence per CLAUDE.md
                          ("captures are evidence, not throwaway logs").
- analysis/               Decompiled C# (frida/proxy/decompiled-*),
                          Ghidra exports for native DLLs (`exports/` only —
                          working state at `projects/` and AVEVA's input
                          binaries at `input/` are gitignored).
- docs/                   Reverse-engineering reference docs.
- tools/                  Setup-LiveProbeEnv.ps1 (Infisical credential fetcher),
                          Compute-Crc.ps1 (.NET parity helper).
- .github/workflows/      Rust CI: fmt + build + test + clippy on Windows.
- LICENSE                 MIT (Joseph Doherty, 2026).

Verified:
- cargo test --workspace → 217 passed (215 unit + 2 .NET parity), 0 failed
- cargo clippy --workspace -- -D warnings → clean
- cargo fmt --all -- --check → clean
- cargo publish --dry-run -p mxaccess-codec → packages cleanly

Excluded from history (see .gitignore):
- **/bin, **/obj, **/target — build artifacts
- analysis/ghidra/projects/ — Ghidra working state (regenerable)
- analysis/ghidra/input/ — AVEVA proprietary DLLs (vendor IP)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:21:00 -04:00

16 KiB
Raw Blame History

Roadmap

The Rust port is staged so each milestone is independently usable: a milestone delivers either a useful crate, a measurable test improvement, or a concrete API surface. Earlier milestones never depend on later ones.

Phasing

M0 — Workspace skeleton

  • Create rust/ workspace per 30-crate-topology.md.
  • Pin Rust toolchain via rust-toolchain.toml.
  • CI: cargo build --workspace, cargo test --workspace, cargo clippy --workspace -- -D warnings, cargo fmt --check.
  • Test infrastructure: tests/fixtures/ populated from captures/ (copy — junctions are Windows-only and don't survive git clone on Linux/macOS; symlinks need Developer Mode on Windows; a plain copy is the only cross-platform option). The matching line in 30-crate-topology.md:29 says the same thing — flag any drift to the user.
  • mxaccess-codec exposes the type stubs (empty bodies returning unimplemented!) so downstream crates compile.
  • Define the Transport trait (and a placeholder Session shape it returns) in M0 as empty/stub signatures in mxaccess so M5 can build against the trait without waiting for M4's NMX implementation. This is what allows M5 to run in parallel with M3/M4 — see "Sequencing dependencies" below. M4 fills in the concrete NmxTransport impl + recovery policy + Stream<Item = DataChange> plumbing; M5 fills in AsbTransport against the same trait.
  • Update CLAUDE.md "Common commands" with cargo invocations.

Definition of done: cargo build --workspace succeeds; CI green on a clean commit; the empty crates publish-check (cargo publish --dry-run -p mxaccess-codec) passes. Dependency on Q2 (license). cargo publish --dry-run requires a resolved license field in Cargo.toml; until Q2 in 70-risks-and-open-questions.md is settled, the publish-check is downgraded to cargo build --release -p mxaccess-codec. M0 cannot complete cleanly with the publish-check until Q2 is resolved.

M1 — Codec parity

Implement every codec type from src/MxNativeCodec/:

  • MxReferenceHandle (CRC-16/IBM, 20-byte layout)
  • NmxTransferEnvelope + NmxTransferEnvelopeTemplate
  • NmxItemControlMessage (advise / supervisory / unadvise)
  • NmxWriteMessage (scalar + array, normal + timestamped)
  • NmxSecuredWrite2Message
  • NmxSubscriptionMessage (DataUpdate, SubscriptionStatus)
  • NmxReferenceRegistrationMessage + Result
  • NmxMetadataQueryMessage
  • NmxOperationStatusMessage
  • ObservedWriteBodyTemplate
  • ASB Variant + AsbStatus + RuntimeValue
  • MxStatus, MxValueKind, MxDataType, MxValue

Definition of done: every Frida-captured write/advise/subscribe body that the .NET reference encodes today round-trips byte-identical through mxaccess-codec — i.e. the proven matrix in work_remain.md (scalar/array writes, advise/unadvise, single-record 0x33 DataUpdate, single-record SubscriptionStatus, the 5-byte 00 00 50 80 00 write-complete frame, and the 1-byte completion frames 0x00/0x41/0xEF preserved verbatim). Cross-validated against src/MxNativeCodec.Tests/ outputs (a fixture runner shells out to dotnet run --project src\MxNativeCodec.Tests and asserts the same bytes are produced for shared inputs).

Captures whose native behaviour the .NET reference does not yet decode are explicitly out of scope for M1 and are tracked elsewhere:

  • captures/077, captures/079-082, captures/094 — buffered batch payloads (work_remain.md:176181); deferred to M6 + R2.
  • captures/036 — Activate/Suspend trigger conditions; deferred to R5.
  • Single-token WriteSecured (returns 0x80004021 before sending the body); deferred to R6.

These captures are still loaded as fixtures so their headers/envelopes round-trip, but the inner unproven payloads are preserved as opaque bytes rather than asserted against a typed decode.

M2 — DCE/RPC + NTLM + OBJREF + OXID + callback exporter

  • mxaccess-rpc: NTLMv2 client context, DCE/RPC PDU codec, TCP transport, OBJREF parser, OXID resolution, IRemUnknown::RemQueryInterface.
  • mxaccess-callback: callback exporter (RPC server with INmxSvcCallback + IRemUnknown).
  • Live probe: connect to local NmxSvc.exe, execute RegisterEngine2 + GetPartnerVersion round-trip, register a callback OBJREF, observe a status frame.

Definition of done: all three of the following must hold against a running AVEVA install (the partnerVersion-only check is insufficient on its own — it does not exercise the callback exporter, which the .NET evidence shows is the hardest part of M2 since MxNativeClient had to hand-roll INmxSvcCallback/IRemUnknown):

  1. cargo run --example connect-nmx issues RegisterEngine2 and observes partnerVersion == 6 in the response (cite docs/DotNet10-Native-Library-Plan.md:64-73 for the expected value).
  2. The Rust callback exporter accepts an inbound IRemUnknown::RemQueryInterface from NmxSvc.exe and returns the negotiated INmxSvcCallback interface pointer — i.e. the server-side handshake against our exported OBJREF completes without an IRemUnknown reject.
  3. At least one INmxSvcCallback::StatusReceived frame is observed end-to-end through the Rust callback exporter (raw frame bytes captured to a fixture under tests/fixtures/m2-status-frame/).

NTLMv2 packet-integrity matches the .NET reference's MakeSignature outputs on a fixed challenge fixture (i.e. byte-equivalent signature for fixed input).

M3 — NMX session + Galaxy resolver

  • mxaccess-galaxy: tiberius-based tag resolver, user resolver. Replicates the recursive CTE from GalaxyRepositoryTagResolver.cs:209293.
  • mxaccess-nmx: NmxClient with register_engine_2, transfer_data, add_subscriber_engine, set_heartbeat_send_interval. Builds MxReferenceHandle from resolver output + MxReferenceHandle::compute_name_signature.
  • Live probes: write TestChildObject.TestInt = 123, subscribe, receive callback.

Definition of done: scalar write + scalar subscribe round-trip live, identical bytes to captures/022-frida-write-test-int-sequence-106-108 and captures/058-frida-subscribe-testint (verified against captures/ directory listing — 077-frida-suspend-advised-scanstate is a suspend capture and belongs to R5, not M3). Re-implementations of dotnet run --project src\MxNativeClient.Probe -- --probe-session-write and --probe-session-subscribe succeed when invoked as cargo run --example session-write / --example session-subscribe.

M4 — Async Tokio façade (NMX path)

  • mxaccess::Session over NmxTransport.
  • mxaccess::Subscription as Stream<Item = Result<DataChange, Error>>.
  • Session::write, write_with_completion, write_with_timestamp, write_secured, write_secured_at, read, subscribe, subscribe_many.
  • Recovery policy + recovery events (mirroring MxNativeSession.RecoveryAttempt* events).
  • tracing instrumentation throughout.
  • Examples: connect-write-read.rs, subscribe.rs, recovery.rs, multi-tag.rs.

Definition of done: the public API is end-to-end usable without referencing mxaccess-codec directly. A consumer can write 30 lines of tokio::main code and get live data. cargo doc --workspace --open produces useful API docs. The examples/ programs all exit 0 against a live AVEVA install.

Parity test fixtures (verified against wwtools/mxaccesscli/):

  • A bare-array reference (e.g. Obj.Arr without brackets) returns MxStatus { category: CommunicationError, detail: 1003 }. Source: wwtools/mxaccesscli/docs/usage.md:215,299. Add as a mxaccess integration test that subscribes to a known bare-array reference and asserts the exact (category, detail) tuple.
  • Read-as-subscribe parity: a read(tag, timeout) against a tag that never publishes returns Error::Timeout(_), with no leaked advise on the server side (verified by issuing a subsequent subscribe and confirming no stale-handle error). Source: wwtools/mxaccesscli/docs/usage.md:24 and wwtools/mxaccesscli/src/MxAccess.Cli/Commands/ReadCommand.cs:14-78.
  • Verified Write parity: write_secured(tag, value, current_user_id, verifier_user_id) with current_user_id == verifier_user_id (single-user path) and with two distinct ids (two-person verification path) both succeed against a tag whose security classification permits it. Source: wwtools/mxaccesscli/src/MxAccess.Cli/Commands/WriteCommand.cs:151-155,196-199.

M5 — ASB transport

  • mxaccess-asb-nettcp: [MS-NMF] net.tcp framing + [MC-NBFX]/[MC-NBFS] binary message encoding (the default NetTcpBinding encoder, not SOAP/XML — see src/MxAsbClient/MxAsbDataClient.cs:660-685) + WCF custom-binary inside ASBIData base64.
  • mxaccess-asb: AsbClient with Connect, RegisterItems, Read, Write, CreateSubscription, AddMonitoredItems, Publish, Disconnect.
  • mxaccess::Session over AsbTransport; capabilities reflect ASB limits (no subscribe_buffered, no Activate/Suspend, no OperationComplete outside the proven write-completion frame).
  • DPAPI shared-secret read on Windows; explicit AsbCredentials::shared_secret(&[u8]) constructor as escape hatch.

Definition of done: cargo run --example asb-subscribe -- --tag TestChildObject.TestInt succeeds against a live ASB endpoint. Round-trip parity with dotnet run --project src\MxAsbClient.Probe. Type matrix in mxaccess-asb covers what work_remain.md:108113 documents as proven: scalar Boolean, Int32, Float, Double, String, DateTime, Duration, plus "deployed array tags" (the array shapes actually exercised against the live VM, not all eight scalar arrays). Less-common ASB types and the unexercised scalar array shapes are deferred — added only as needed by real deployed tags, per work_remain.md:110.

M6 — Compatibility shim + production hardening

  • mxaccess-compat: LMXProxyServer-shaped methods on top of Session.
  • subscribe_buffered (NMX feature) — guarded by BufferedOptions; no synthesis if provider returns single-sample batches.
  • Performance pass: zero-copy frame parsing where possible (bytes::Bytes), pre-allocated BytesMut per session, codec allocation count benchmarked.
  • Optional metrics feature emitting counters / histograms.
  • Docs: cargo doc published; cargo public-api baseline established.
  • Release: cargo publish all crates.

Definition of done: the codec hits the per-write allocation target from R12 (< 5 allocations per write at steady state, measured via cargo bench with a counting allocator); live subscribe under churn does not allocate per-message; cargo public-api produces a stable surface that clears review.

cargo bench latency numbers are reported but not gating — this matches the V1 non-goal "we measure but don't gate beyond M6's loose acceptance bar" below. There is no .NET microbench harness to compare against (the .NET reference ships probes and assertion-style runners, not benchmarks); building one is out of scope for V1. If a future milestone adds a comparison harness, document it here and only then can a "comparable to .NET" clause be added without contradicting the non-goal.

Validation strategy

Three lines of defense against regression.

1. Round-trip fixtures

Every byte sequence in captures/0NN-frida-* and analysis/frida/*.tsv is a fixture. Test cases load the bytes, decode them, re-encode, and assert equality. New scenarios add new fixtures, never modify old ones. Fixtures live under crates/mxaccess-codec/tests/fixtures/ (linked or copied).

2. Live probes

Per-milestone live probes mirror the .NET probes (MxNativeClient.Probe, MxAsbClient.Probe). They run only when MX_LIVE env var is set, match the .NET command-line surface, and print the same artifacts. The Rust examples are the canonical live tests.

CI lane status (V1). Live probes require a Windows runner with AVEVA System Platform installed and a populated Galaxy DB. AVEVA System Platform is a licensed Windows-only install with a SQL Galaxy attached, so a hosted CI runner cannot be spun up from a public image. V1 ships without a hosted CI lane for live probes — they run only on the maintainer's workstation, gated by MX_LIVE=1. PRs that touch the live-probe surface (anything under crates/*/examples/ invoked when MX_LIVE is set, plus mxaccess-galaxy and the NMX/ASB transport crates' integration tests) require a screenshot or capture log from a successful local run attached to the PR. Hosted CI for milestones M2/M3/M4/M5 covers cargo build, cargo test --workspace (non-live), and cargo clippy only. Building a hosted live-probe lane (a pinned AVEVA VM image + Galaxy seed snapshot, owned by the project) is a stretch goal post-V1; it is not a V1 deliverable.

3. Cross-implementation parity

For each milestone with a dotnet run equivalent, the same operation runs through both:

  • the .NET reference (dotnet run --project src\MxNativeClient.Probe -- --probe-session-write ...)
  • the Rust port (cargo run --example session-write -- ...)

Wireshark / Frida captures of both runs are diffed; any byte-level divergence is a regression.

Caveat — parity is not correctness. Byte-parity tests confirm the Rust port matches the .NET reference; they do not confirm correctness in the absolute sense. Specifically, the completion-only frame mappings (0x00, 0x41, 0xEF, plus MXSTATUS_PROXY[] conversion — see work_remain.md:170-174) are unmapped in both implementations; both preserve them verbatim. If the .NET reference is wrong about one of these bytes, the Rust port will also be wrong, and the parity test will still pass green. R3 and R4 in 70-risks-and-open-questions.md track this. These frames are marked "preserved verbatim" rather than "verified correct" in the milestone DoDs that touch them.

Build & test commands

To be added to CLAUDE.md "Common commands" once rust/ exists:

# Workspace-wide
cargo build --workspace
cargo test --workspace
cargo clippy --workspace -- -D warnings
cargo fmt --check

# Single crate
cargo test -p mxaccess-codec

# Live probes (require AVEVA + Galaxy DB). Credentials come from Infisical via
# wwtools/secrets/Get-Secret.ps1 — never inline plaintext. The setup script
# fetches the WW_VM_ADMIN_* secrets and (when present) the AVEVA-specific
# ASB_SHARED_SECRET, then exports MX_LIVE, MX_NMX_HOST, MX_GALAXY_DB,
# MX_TEST_USER, MX_TEST_DOMAIN, MX_TEST_PASSWORD, MX_ASB_SHARED_SECRET.
. .\tools\Setup-LiveProbeEnv.ps1                      # dot-source so env vars persist
cargo test -p mxaccess --features live -- --ignored

# CI / dev fallback when ASB shared secret is not yet in Infisical:
. .\tools\Setup-LiveProbeEnv.ps1 -SkipAsbSecret
# ...then construct AsbCredentials::shared_secret(&[u8]) explicitly in the test.

# Examples
cargo run --example connect-write-read
cargo run --example subscribe -- --tag TestChildObject.TestInt
cargo run --example asb-subscribe -- --tag TestChildObject.TestInt

# Benchmarks (M6)
cargo bench -p mxaccess-codec

Sequencing dependencies

Milestone Depends on Blocks
M0 nothing M1
M1 M0 M2, M3, M5
M2 M0, M1 M3
M3 M1, M2 M4
M4 M3 M6 (NMX)
M5 M0, M1 (codec only — ASB does not need M2/M3) M6 (ASB)
M6 M4 (NMX consumers) or M5 (ASB consumers) release

M5 can be developed in parallel with M3/M4 because ASB does not depend on DCE/RPC and because the Transport trait (plus the placeholder Session shape it returns) is defined in M0, not M4. M0 publishes the empty/stub trait; M4 fills in the concrete NMX-side Session recovery policy, Stream<Item = DataChange>, and NmxTransport impl; M5 fills in AsbTransport against the same M0 trait. Without that M0-level trait split, M5 would block on M4 (since the Session/Transport types live in mxaccess, which sits below both transports per 30-crate-topology.md). The two transport paths converge into the same Session at M4 (NMX) and M5 (ASB).

What this roadmap deliberately does not include (V1)

  • cargo bench numbers as gating criteria. We measure but don't gate beyond M6's loose acceptance bar.
  • Drop-in COM interop. mxaccess-compat wraps the API shape, not the COM ABI. A mxaccess-compat-com crate is post-V1.
  • Multi-runtime support (smol, async-std). Tokio only.
  • 32-bit Windows. x64 only by design.
  • Linux-first deployment. Linux is a stretch goal sitting behind feature flags; see 70-risks-and-open-questions.md Q3.
  • Full OperationComplete parity. Bound to whether the .NET reference can prove the trigger conditions; see R3/R4 in 70-risks-and-open-questions.md.