Files
Joseph Doherty fe2a6db786
rust / build / test / clippy / fmt (push) Has been cancelled
Initial project state: .NET reference, design, Rust port (M0+M1), evidence
Layout:
- src/                    .NET 10 x64 reference: MxNativeCodec, MxNativeClient,
                          MxAsbClient, probes, tests, harnesses. Executable spec.
- design/                 Architectural plan for the Rust port (M0–M6), error
                          model, protocol invariants, risks (R1–R16), adversarial
                          review log (review.md).
- rust/                   Rust workspace. M0 skeleton + M1 codec parity.
                          mxaccess-codec: 215 unit tests + 2 cross-implementation
                          parity tests (byte-identical against .NET reference).
                          Other crates are M0 stubs awaiting M2+.
- captures/               Frida + netsh + pcap evidence per CLAUDE.md
                          ("captures are evidence, not throwaway logs").
- analysis/               Decompiled C# (frida/proxy/decompiled-*),
                          Ghidra exports for native DLLs (`exports/` only —
                          working state at `projects/` and AVEVA's input
                          binaries at `input/` are gitignored).
- docs/                   Reverse-engineering reference docs.
- tools/                  Setup-LiveProbeEnv.ps1 (Infisical credential fetcher),
                          Compute-Crc.ps1 (.NET parity helper).
- .github/workflows/      Rust CI: fmt + build + test + clippy on Windows.
- LICENSE                 MIT (Joseph Doherty, 2026).

Verified:
- cargo test --workspace → 217 passed (215 unit + 2 .NET parity), 0 failed
- cargo clippy --workspace -- -D warnings → clean
- cargo fmt --all -- --check → clean
- cargo publish --dry-run -p mxaccess-codec → packages cleanly

Excluded from history (see .gitignore):
- **/bin, **/obj, **/target — build artifacts
- analysis/ghidra/projects/ — Ghidra working state (regenerable)
- analysis/ghidra/input/ — AVEVA proprietary DLLs (vendor IP)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:21:00 -04:00

57 KiB
Raw Permalink Blame History

Adversarial design review

Generated: 2026-05-05. Reviewer: Claude general-purpose subagent (per-file, hostile framing).

This is a challenge review — not a style pass. Reviewers are instructed to question implementation choices, design tradeoffs, and assumptions; verify every load-bearing claim against the cited evidence in src/ (.NET reference), docs/, analysis/, captures/; and surface where the design could fail under real-world conditions.

Status (2026-05-05)

All findings have been addressed across three triage passes.

Severity Count Status
[BLOCKER] 24 All resolved (cluster pass 1)
[MAJOR] ~26 All resolved (cluster pass 2)
[MEDIUM] 3 All resolved (cluster pass 1)
[MINOR] ~14 All resolved (cluster pass 3)
[NIT] ~7 All resolved (cluster pass 3)

Each finding bullet below is prefixed with [RESOLVED] to indicate the design doc has been corrected. The original finding text is preserved verbatim as the audit trail; see git log design/*.md for the specific edits. New risks added in response: R13 (recordCount != 1 panic risk), R14 (fabricated 0x80004021 → StaleItem mapping), R15 (Drop-time async cleanup hazards), R16 (crypto/auth crate maintenance drift). Severity tiers (P0/P1/P2) were added to all R-items.

Severity-tag legend (in original review):

  • BLOCKER — must fix before implementation; protocol or safety bug, or load-bearing claim that is fabricated / contradicts evidence.
  • MAJOR — load-bearing assumption that is unsupported, under-specified, or likely wrong.
  • MEDIUM — moderate-severity finding falling between MAJOR and MINOR.
  • MINOR — clarity, consistency, or naming.
  • NIT — style / preference.

design/00-overview.md

  • [RESOLVED] [BLOCKER] Doc lists register, write, advise, read as Transport trait primitives (design/00-overview.md:61) and the public API (design/10-raw-layer.md "Read" section, line 199) confirms ReadAsync is "implemented as a transient subscription read". Putting read on the Transport trait alongside the actual wire primitives misrepresents the protocol — there is no NMX read primitive. MxNativeSession.ReadAsync (src/MxNativeClient/MxNativeSession.cs:312351) implements read as SubscribeAsync + first-callback-result + dispose. If the Transport trait shape implies read is transport-level, every transport must reimplement the subscribe-then-cancel dance instead of getting it once at the session layer. This contradicts principle 1 ("do not fabricate protocol behavior") by inventing a wire primitive that does not exist.
  • [RESOLVED] [MAJOR] Diagram labels mxaccess-asb-soap as "NetTcp + SOAP" with "DH/HMAC/AES" (design/00-overview.md:7577). The DH/HMAC/AES claim is supported (src/MxAsbClient/AsbSystemAuthenticator.cs:2334, 73122). But the "SOAP" label is misleading: the .NET reference uses NetTcpBinding(SecurityMode.None) with the default binary message encoder (src/MxAsbClient/MxAsbDataClient.cs:660663). SOAP envelope bytes only exist as an in-memory infoset; on the wire it's MS-NMF-framed binary XML. design/70-risks-and-open-questions.md:918 (R1) confirms the plan is to "hand-roll the framing per [MS-NMF]" — no SOAP text framing. Crate name mxaccess-asb-soap and the diagram label will mislead implementers into thinking SOAP envelopes are on the wire.
  • [RESOLVED] [MAJOR] Principle 3 says "Raw layer is unsafe-free. No raw pointers, no transmute, no FFI in the public surface" (design/00-overview.md:33), but principle 6 mandates windows-rs for "OBJREF building, IPID/OXID/OID handling, GUID literals" (line 36) and the diagram shows mxaccess-rpc consuming windows crate (line 88). Every windows-rs COM call is unsafe fn. The .NET reference uses Type.GetTypeFromProgID + Activator.CreateInstance (src/MxNativeClient/ManagedNmxService2Client.cs:3336); windows-rs exposes equivalents only via unsafe { CoCreateInstance(...) }. The doc never reconciles this — either "raw layer is unsafe-free in the public surface" (i.e. internal unsafe allowed) or windows-rs types stay out of mxaccess-rpc. As written, principles 3 and 6 are in direct tension.
  • [RESOLVED] [MAJOR] Principle 8 says "No spawn from inside Drop; no blocking calls inside async fn" (design/00-overview.md:38) but principle 2 in "Async layer" (line 10) promises "drop … cancellation" and 70-risks-and-open-questions.md:24 / R3 design references Stream cleanup. Drop-based cancellation of an in-flight subscription on NmxSvc requires sending an unadvise/UnregisterReference frame — that is I/O. If Drop cannot spawn and cannot block, the only options are (a) abandon the server-side subscription leak (NMX leak in the service process), (b) require an explicit close().await and downgrade Drop to a panic-in-debug, or (c) hand the cleanup to a background reaper task (which itself was spawned at session start, not inside Drop). The doc never specifies which, and "drop cancellation" implies (a)/(c) without saying so. Compare MxNativeSession.DisposeAsync/explicit unregister flows (src/MxNativeClient/MxNativeSession.cs uses Volatile, recovery, etc.) — none are Drop-equivalent.
  • [RESOLVED] [MAJOR] Crate-mapping table (design/00-overview.md:2225) splits MxNativeClient into mxaccess-rpc, mxaccess-nmx, mxaccess-callback for transport and puts MxNativeSession and MxNativeCompatibilityServer solely in the async mxaccess crate ("raw layer ends at transport"). But the .NET MxNativeSession is the only place where Galaxy resolution + handle lookup + correlation-id bookkeeping + recovery state are wired (src/MxNativeClient/MxNativeSession.cs:90125, 312351, 573). If mxaccess-nmx exposes only "INmxService2 client + envelope" (design/00-overview.md:69), then Session must reimplement every cross-cutting concern (subscription registry, recovery, callback routing) inside the async crate. Either the raw layer is incomplete (cannot be used standalone for "byte-level control" as line 12 promises, because there's no session/correlation surface), or the split is wrong. The 1:1 mapping in the table papers over this.
  • [RESOLVED] [MEDIUM] Principle 9 says "ASB-only paths return Error::Unsupported on NmxTransport and vice versa; capability is queryable" (design/00-overview.md:39), but non-goal #5 (line 48) says "the Rust port routes those [callback-only ops] to NMX; ASB owns the regular tag data plane only". A consumer holding an AsbTransport and calling activate(item) will get Error::Unsupported — but the natural expectation (set by the non-goals paragraph) is that the high-level mxaccess Session transparently routes callback-only ops to NMX even when ASB is the data plane. The doc never says how the dual-routing works at the Session level — does the session hold both transports? Is the user expected to construct two? Principle 9 ("two transports, one façade") and non-goal 5 are in tension.
  • [RESOLVED] [MEDIUM] Line 16: "every async method bottoms out in a sync codec call (NmxTransferEnvelope.Encode)". Searched src/MxNativeCodec — there is no NmxTransferEnvelope.Encode. The class is NmxTransferEnvelopeTemplate with Encode(ReadOnlySpan<byte>) (src/MxNativeCodec/NmxTransferEnvelopeTemplate.cs:33). Minor naming drift, but in a doc whose principle 1 is "do not fabricate" and which cites filenames as evidence, citing a non-existent method weakens the rhetorical case.
  • [RESOLVED] [MEDIUM] Principle 7 cites dbo.gobject / dbo.instance / dbo.dynamic_attribute as the SQL surface (design/00-overview.md:37). Verified at src/MxNativeClient/GalaxyRepositoryTagResolver.cs:215, 253, 257. However CLAUDE.md (project root) lists the surface as aa_attribute / aa_object / mx_attribute_category. Grepping the .cs file shows zero hits for any of those names. CLAUDE.md is wrong, the design doc is right, but the design doc should call this out as a CLAUDE.md correction — otherwise the next implementer who trusts CLAUDE.md will write SQL against tables that don't exist.

design/10-raw-layer.md

  • [RESOLVED] [BLOCKER] Doc Item-control table claims Advise (plain) opcode 0x1f = 37 bytes, distinct from AdviseSupervisory 0x1f = 39 bytes (design/10-raw-layer.md:99-104). The .NET reference does not support a 37-byte plain-advise variant: NmxItemControlMessage.Parse accepts only AdviseSupervisory or UnAdvise (src/MxNativeCodec/NmxItemControlMessage.cs:46-48), and MxNativeCompatibilityServer.AdviseSupervisory aliases plain Advise to it (src/MxNativeClient/MxNativeCompatibilityServer.cs:256-258). The doc's three-row table fabricates a "plain advise" length the codec rejects.
  • [RESOLVED] [BLOCKER] Doc claims Boolean write body has "1 value byte (0xFF/0x00)" giving 37 bytes total (design/10-raw-layer.md:114). Source actually emits 4 value bytes: [0xff,0xff,0xff,0x00] or [0x00,0xff,0xff,0x00] (src/MxNativeCodec/NmxWriteMessage.cs:257). Total still 37 because the suffix is 11+4 (not 14+4), but the per-byte breakdown in the doc is wrong — the "1 value byte" claim will lead a Rust port to emit a 34-byte body that the receiver rejects.
  • [RESOLVED] [MAJOR] Doc String-write row says "Total = 26 + N" (design/10-raw-layer.md:118). True size is KindOffset(17)+1+4+4+N+14+4 = 44+N (src/MxNativeCodec/NmxWriteMessage.cs:150). The row drops the 14+4 suffix that every other row in the same table includes — an inconsistency in the column meaning that will produce undersized buffers.
  • [RESOLVED] [MAJOR] Doc array-body row says "28 + N + 18" with layout "4-byte marker + count(u16) + width(u16) + elements" (design/10-raw-layer.md:120). Code shows count is at body offset 22 and width at 24, with two separate gap regions: bytes 18-21 are zero-padding and bytes 26-27 are zero-padding (src/MxNativeCodec/NmxWriteMessage.cs:179-184). The "4-byte marker" framing implies a single contiguous marker before count; in reality it's a 4-byte gap and a 2-byte gap surrounding the count/width. Documenting it that way will round-trip wrong on captures.
  • [RESOLVED] [MAJOR] Doc claims MxStatus.source: MxStatusSource (design/10-raw-layer.md:178). The .NET reference field name is DetectedBy (src/MxNativeCodec/MxStatus.cs:31). Drift in field name across the parity boundary; tests that round-trip serialize will diverge.
  • [RESOLVED] [MAJOR] Doc claims wire kinds 0x410x46 for arrays in MxValue/write (design/10-raw-layer.md:120, 149). Encoder side never emits 0x46NmxWriteMessage.GetWireKind collapses StringArray and DateTimeArray both to 0x45 (src/MxNativeCodec/NmxWriteMessage.cs:107). Decoder side does treat 0x46 as DateTimeArray (src/MxNativeCodec/NmxSubscriptionMessage.cs:173, 275). The doc lumps both directions as "0x41..0x46", masking an asymmetry the Rust port must replicate.
  • [RESOLVED] [MAJOR] Doc says "Duration for ElapsedTime: 4-byte milliseconds on the wire" (design/10-raw-layer.md:220). Source decodes the wire as signed i32 (BinaryPrimitives.ReadInt32LittleEndian, src/MxNativeCodec/NmxSubscriptionMessage.cs:252) and produces TimeSpan.FromMilliseconds(milliseconds). Rust std::time::Duration is unsigned — a negative ms value (which the encoding allows) will panic or be clamped. Either spec a signed type (time::Duration or i64 ms) or document the negative-handling policy.
  • [RESOLVED] [MAJOR] Doc DataUpdate parser sketch says recordCount(i32, typically 1) and shows generic records[recordCount] (design/10-raw-layer.md:144-147). The .NET reference hard-rejects any record count != 1: if (recordCount != 1) throw (src/MxNativeCodec/NmxSubscriptionMessage.cs:71-74). The Rust sketch implies general support for N records that the executable spec does not provide; either lift the constraint with capture evidence or document it as an asserted invariant.
  • [RESOLVED] [MAJOR] Doc says NTLM Type1 negotiate flags are "Unicode | RequestTarget | Sign | Seal | ExtendedSessionSecurity | Negotiate128 | KeyExchange" (design/10-raw-layer.md:246). Searched ManagedNtlmClientContext.cs for the actual flag set used in the negotiate emission — the file does derive sign/seal/sequence keys (src/MxNativeClient/ManagedNtlmClientContext.cs:177-200) and uses _user.ToUpperInvariant() for response key derivation (line 79), but the doc lists no citation for the Type1 flag bitfield. No line/offset is referenced; the flag list is a claim a Rust port could mis-encode without a fixture. Add a citation or capture.
  • [RESOLVED] [MAJOR] Doc puts NmxTransferEnvelope.reserved as i32 "preserved from observed; default 0" (design/10-raw-layer.md:78-79). The .NET encoder unconditionally writes 0 there (src/MxNativeCodec/NmxTransferEnvelope.cs:91) and Parse does not extract or expose the reserved bytes at all — there is no preservation. The doc's "round-trip preserver" promise (design/10-raw-layer.md:92) cannot be satisfied unless the Rust codec adds a field the .NET reference has thrown away. Either fix the .NET reference or drop the claim.
  • [RESOLVED] [MAJOR] Doc says SubscriptionStatus has recordCount(i32) + operationId(GUID 16) + correlationId(GUID 16) + records[recordCount] immediately after cmd+version (design/10-raw-layer.md:135-140). Source orders fields the same but reads recordCount at offset 3, operationId at 7, correlationId at 23, records at 39 (src/MxNativeCodec/NmxSubscriptionMessage.cs:54-55, 98-99). Total agrees, but the diagram's + correlationId includes records-bearing offsets that the .NET parser splits into two distinct paths (DataUpdate has no correlationId; SubscriptionStatus does). Doc places correlationId in both records (line 135-140 union) — Rust port must not parse correlationId for 0x33.
  • [RESOLVED] [MAJOR] Doc warns about .NET ToLowerInvariant() vs Rust str::to_lowercase() Unicode divergence and proposes a _legacy variant using "unicase::Ascii::to_lowercase" (design/10-raw-layer.md:66-68). unicase::Ascii::to_lowercase only handles ASCII — it is not a substitute for ToLowerInvariant() on non-ASCII Galaxy tag names. If the captured tags include non-ASCII (Turkish, German), the proposed legacy fallback will produce a different CRC than the .NET reference, not the same one. The mitigation as written makes the divergence worse.
  • [RESOLVED] [MAJOR] MxReferenceHandle Rust struct uses pub for every primitive field including fields recomputed from name signatures (design/10-raw-layer.md:28-41). The original: Bytes preservation pattern (design/10-raw-layer.md:226) cannot apply here because the struct is Copy with no buffer. A consumer mutating object_signature directly will desync from compute_name_signature(tag_name) — there is no invariant binding the two together. Either expose the handle as opaque with accessors, or document that signatures are caller-owned and the codec will not recompute.
  • [RESOLVED] [NIT] Doc names callback opnums DataReceivedRaw (3) and StatusReceivedRaw (4) (design/10-raw-layer.md:296). Source names them DataReceived/StatusReceived (src/MxNativeClient/NmxSvcCallbackMessages.cs:11-12, NmxProcedureMetadata.cs:89-101). The Raw suffix is doc-invented, will diverge from any cross-reference grep against the .NET reference.
  • [RESOLVED] [NIT] Doc claims the ElapsedTime wire kind 0x07 is part of the array set "0x41..0x46" decoded scalars (design/10-raw-layer.md:149). It is in the scalar set 0x01..0x07 — but MxValueKind does not enumerate ElapsedTimeArray, while MxValue also lacks one (design/10-raw-layer.md:200-215). If 0x47 ever appears in a capture, both .NET and the proposed Rust enums would silently drop it. Worth flagging as a known gap.

design/20-async-layer.md

  • [RESOLVED] [BLOCKER] Session::write claims a single &str reference is sufficient, but MxNativeSession.WriteAsync requires writeIndex and clientToken parameters that drive correlation of OperationStatus callbacks (src/MxNativeClient/MxNativeSession.cs:165-185). The Rust API at design/20-async-layer.md:32-49 has no clientToken, so the await cannot await a wire WriteCompleted; it can only confirm the LMX Write RPC return code. This contradicts the claim that "write" is a true async operation that completes when the wire confirms — section 310 even concedes the 5-byte completion frame is observed-only. The single-shot await model is misleading.
  • [RESOLVED] [BLOCKER] write_secured is offered unconditionally (design/20-async-layer.md:40-45) but the .NET reference explicitly throws NotSupportedException for it, citing 0x80004021 from captures 036/038/039 (MxNativeSession.cs:211-221). The Rust API surface promises a behaviour the proven stack does not deliver. Either drop it or rename to write_secured2 to match WriteSecured2Async.
  • [RESOLVED] [BLOCKER] read is described as a one-shot read(&str) -> DataChange (design/20-async-layer.md:47-49) with no timeout argument. The .NET reference's ReadAsync is explicitly a timed advise/first-callback/unadvise dance and requires TimeSpan timeout > 0 (MxNativeSession.cs:312-359). The Rust API has no semantic for the case where no callback ever arrives — tokio::time::timeout can drop the future, but on drop the Rust design does not say it issues UnAdvise (only Subscription drop does). This leaks an advise on every read timeout.
  • [RESOLVED] [MAJOR] Subscription is declared Stream<Item = Result<DataChange, Error>> (design/20-async-layer.md:70) without specifying whether Err is terminal. The .NET reference fans out records via CallbackReceived and routes parse errors to a separate UnparsedCallbackReceived event (MxNativeSession.cs:590-607). The Rust design conflates these. If Err terminates the stream, transient parse errors kill an otherwise healthy subscription; if Err is recoverable, downstream while let Some(Ok(_)) consumers silently skip data. Pick one and document, or split into two streams matching .NET.
  • [RESOLVED] [MAJOR] Drop-cancellation of Subscription claims to "send UnAdvise (best-effort, fire-and-forget via tokio::spawn)" (design/20-async-layer.md:70). On runtime shutdown tokio::spawn from a Drop impl panics if no runtime is current, and during Runtime::shutdown_timeout spawned tasks are aborted before they can flush. The .NET reference disposes synchronously, sending UnAdvise per subscription on the same thread (MxNativeSession.cs:483-495). Document the runtime-shutdown path or provide an explicit async fn close().
  • [RESOLVED] [MAJOR] Session: Clone + Send + Sync with shared Arc<SessionInner> (design/20-async-layer.md:27) but no explicit close() API. Last-clone-drop running unregister_engine "best-effort" requires either tokio::spawn (same shutdown hazard as above) or block_on (forbidden by section 305). The .NET reference is IDisposable synchronous and unregisters explicitly. The design has no answer for "I want to make sure the engine is unregistered before my process exits."
  • [RESOLVED] [MAJOR] Recovery is presented as automatic on heartbeat-loss (design/20-async-layer.md:114), but MxNativeSession.RecoverConnection* is explicitly caller-driven — the .NET API exposes RecoverConnectionAsync(policy) and never auto-starts (MxNativeSession.cs:383-440). Worse: during recovery, _recoveryActive is just a flag set on inbound callbacks; in-flight writes against _service are not paused or replayed. The Rust design's promise that "the future resumes on the new connection" is unbacked — port the .NET semantics (concurrent calls fail, caller decides) or capture the gap.
  • [RESOLVED] [MAJOR] subscribe_buffered (design/20-async-layer.md:87-100) returns Stream<DataChangeBatch> but the .NET equivalent RegisterBufferedItemAsync takes itemDefinition, itemContext, and crucially an itemHandle: int (MxNativeSession.cs:272-310). The Rust API drops the handle and the (definition, context) split, hiding the dual-string requirement. A consumer cannot reproduce the captured Frida bodies through this API.
  • [RESOLVED] [MAJOR] subscribe_many(&["A.X","A.Y","A.Z"]) is described as multiplexing one callback channel and demultiplexing by correlation ID (design/20-async-layer.md:74-82). The .NET reference issues per-tag AdviseSupervisory with one CorrelationId each (MxNativeSession.cs:250-270); there is no atomicity. If the second Advise errors, the design does not specify whether the first is rolled back. A consumer expects either all-or-nothing or partial success surfaced — the doc says neither.
  • [RESOLVED] [MAJOR] Transport trait uses #[async_trait] (design/20-async-layer.md:168), forcing heap allocation per call and breaking the recently-stabilized native async fn in trait. If the project pivots to dyn-compatible native AFIT (Rust 1.75+ requires dyn Transport to use Pin<Box<dyn Future>> returning fns), the trait is not dyn-safe as written because callbacks(&self) -> CallbackStream returns a concrete struct — fine — but async fn methods are not dyn-safe without RPITIT workarounds. Pick #[async_trait] (legacy) or document that Transport is generic-only.
  • [RESOLVED] [MAJOR] RecoveryEvent enum (design/20-async-layer.md:122-127) is missing WillRetry: bool from MxNativeRecoveryFailureEvent (MxNativeSession.cs:47-51). Consumers cannot distinguish "this attempt failed but the policy will retry" from "terminal failure." Without it, downstream code cannot decide when to tear down its own state.
  • [RESOLVED] [MAJOR] quality: u16 on DataChange (design/20-async-layer.md:222) but the .NET MxStatus model has its own categories (MxStatusCategory/MxStatusSource) and the codec already exposes MxStatus. Exposing a raw u16 next to status: MxStatus invites callers to use the wrong field — the .NET reference uses Record.ToDataChangeStatus() (MxNativeSession.cs:70) as the canonical projection. Drop the u16 or document the precedence.
  • [RESOLVED] [MINOR] set_recovery_policy takes &mut session in the sample (design/20-async-layer.md:288) but Session is Clone and Arc-backed (:27). Mutation through &mut on a clone is a foot-gun: clones won't see policy changes unless they're stored behind interior mutability. Either make it &self with tokio::sync::watch or document that it must be called before any clone is made.
  • [RESOLVED] [MINOR] &'static str in Error::Unsupported (design/20-async-layer.md:204) prevents formatting the offending operation with runtime context (e.g. capability name + transport variant). Use Cow<'static, str> or a structured variant.

design/30-crate-topology.md

  • [RESOLVED] [BLOCKER] quick-xml is the wrong dep entirely. NetTcpBinding default uses BINARY message encoder (.NET MC-NMF + MC-NBFX/NBFS dictionary tables), not XML over the wire. There is no XML envelope to parse; framing is binary records with dictionary string interning (design/30-crate-topology.md:130, :247). Evidence: src/MxAsbClient/MxAsbDataClient.cs:660685 constructs NetTcpBinding(SecurityMode.None) with no override — default binding element is BinaryMessageEncodingBindingElement. The mxaccess-asb-soap crate name itself is a misnomer; needs MC-NMF/MC-NBFX framing, not SOAP/XML. quick-xml may still be needed for the small ASB control-plane XML payloads (request.ToXml() at AsbSystemAuthenticator.cs:79), but cannot frame net.tcp.
  • [RESOLVED] [BLOCKER] mxaccess-asb-soap is publish = false while mxaccess-asb (publishable) depends on it (design/30-crate-topology.md:128138). Cargo refuses cargo publish on a crate whose path-dep lacks a published version. Either publish both, or fold the framing module into mxaccess-asb.
  • [RESOLVED] [BLOCKER] rc4 = "0.1" does not match crates.io. Latest is rc4 v0.2.0 and the 0.1 line is unmaintained (design/30-crate-topology.md:245). Worse: rc4 is published by RustCrypto but flagged stale; for NTLMv2 seal/sign most projects pull cipher + a manual ARC4 or use ntlm-rs/equivalent. Also 0.1 predates the cipher 0.4 trait reform that aes 0.8/hmac 0.12 were built on.
  • [RESOLVED] [MAJOR] Pinned crypto versions form an inconsistent generation. aes = "0.8", hmac = "0.12", md-5 = "0.10", sha-1 = "0.10", pbkdf2 = "0.12" are the older cipher 0.4/digest 0.10 line, but the design says "1.83+ stable" (design/30-crate-topology.md:241250). Current crates pulled from index are aes 0.9, hmac 0.13, md-5 0.11, pbkdf2 0.13 — all bumped to digest 0.11/cipher 0.5 and require rust-version: 1.85. Mixing 0.10/0.12-line traits with 0.13-line will fail to resolve a coherent digest. Either pin the whole RustCrypto generation to one line, or bump MSRV to 1.85 to match.
  • [RESOLVED] [MAJOR] sha-1 = "0.10" is explicitly deprecated by upstream: "This crate is deprecated! Use the sha1 crate instead." (design/30-crate-topology.md:243). Pinning a deprecated crate in a fresh greenfield workspace is gratuitous.
  • [RESOLVED] [MAJOR] windows = "0.58" is significantly stale; current is 0.62.2 (design/30-crate-topology.md:253). Between 0.58 and 0.62 the COM, RPC and Cryptography modules saw breaking renames (e.g. Win32_System_Rpc surface trimmed, Security_Cryptography reorganised). Designing against 0.58 then "upgrading later" wastes work. Pin to 0.62.x.
  • [RESOLVED] [MAJOR] tiberius = "0.12" + auth-windows claim. tiberius 0.12.3 default features include winauth (SSPI) only on Windows; integrated security against MSSQL via SSPI is supported, but the design lists mxaccess-galaxy as "All Rust targets (TDS works cross-platform)" while only providing auth-windows for integrated security (design/30-crate-topology.md:9496, :203). On Linux, the only auth options are SQL logins or integrated-auth-gssapi (Kerberos). Galaxy databases in practice are domain-joined Windows boxes using NTLM/Kerberos integrated auth — Linux clients won't work without integrated-auth-gssapi and a configured KDC. The doc claims cross-platform without flagging this.
  • [RESOLVED] [MAJOR] num-bigint = "0.4" for DH ModPow is not constant-time (design/30-crate-topology.md:251). The .NET reference uses BigInteger.ModPow which is also not constant-time, but the Rust port has the chance to use a constant-time bignum (crypto-bigint). Since the DH exponent is the long-lived private key (AsbSystemAuthenticator.cs:153166), a side-channel-leaky mod_exp re-creates a defect, not parity. Flag as a security regression vs. an opportunity.
  • [RESOLVED] [MAJOR] Crate boundary: mxaccess-galaxy is split out, but the .NET reference keeps GalaxyRepositoryTagResolver.cs inside MxNativeClient namespace (namespace MxNativeClient; at line 4) (design/30-crate-topology.md:117122). Splitting into a separate crate forces mxaccess-nmxmxaccess-galaxy, but the resolver returns MxReferenceHandle (a mxaccess-codec type) and is consumed by NMX register flows — fine, no cycle. However, mxaccess-callback does NOT need mxaccess-galaxy, yet the diagram routes everything through mxaccess-nmx which depends on both. Minor coupling concern: the resolver pulls tiberius (heavy, native-tls/rustls/winauth) into every consumer of mxaccess-nmx. Should be a feature-gated optional dep.
  • [RESOLVED] [MAJOR] Feature dpapi default-on for Windows under mxaccess-asb — but the design does not mention #[cfg(feature = "dpapi")] boundaries inside mxaccess-asb (design/30-crate-topology.md:139, :202). The ASB shared secret is mandatory for the DH passphrase derivation (AsbSystemAuthenticator.cs:28, :134142). With dpapi=off and no alternate secret source spec'd, the crate cannot authenticate at all. Either remove dpapi as optional, or define an explicit SecretProvider trait that DPAPI plugs into.
  • [RESOLVED] [MAJOR] clippy::unwrap_used = deny interaction with the error model (design/30-crate-topology.md:192, design/50-error-model.md:30). Arc<str> construction from &str via Arc::<str>::from(&*s) is fine, but TypeMismatch { reference: Arc<str>, ... } formatted via thiserror #[error("... {reference} ...")] requires Arc<str>: Display, which it is — no unwrap path. Real risk: any place the codec parses a UTF-16LE name and calls String::from_utf16(...).unwrap() will trip the lint. The design needs an explicit "fallible UTF-16 decode helper" rule. Worth flagging because MxReferenceHandle parsing is core.
  • [RESOLVED] [MAJOR] MSRV 1.83 vs. dependency versions. uuid v1 features ["v4", "v7"]uuid 1.23.1 requires rust-version: 1.85.0 (design/30-crate-topology.md:188, :228, :233). Same for current tiberius indirect deps. Pinning MSRV to 1.83 while pulling uuid = "1" (= latest) is contradictory. Either pin uuid = "=1.10" (last 1.83-compatible) or raise MSRV to 1.85.
  • [RESOLVED] [MINOR] "Pinned to a recent stable Rust via rust-toolchain.toml. MSRV equals the pinned version (no separately-stated MSRV — pinning is the contract)" contradicts rust-version = "1.83" in the workspace skeleton (design/30-crate-topology.md:188, :228). Pick one policy.
  • [RESOLVED] [MINOR] Build commands include cargo run --example connect-write-read but the workspace example-target only resolves at the workspace root if a single crate owns examples (design/30-crate-topology.md:2027, :181183). With nine crates and examples/ at the workspace root (line 20), cargo run --example will fail unless examples live inside a specific crate (typically mxaccess).
  • [RESOLVED] [MINOR] License "MIT OR Apache-2.0 — to be confirmed" (design/30-crate-topology.md:226, :269). tiberius is MIT/Apache-2.0; everything else verified is MIT-or-Apache-2.0 compatible. No GPL/AGPL contamination found in the pinned set. Risk is windows-rs proxy/stub IDL re-emissions if any are vendored from Microsoft headers — flag for legal review when the codec ports OBJREF/OXID structs derived from MIDL output.
  • [RESOLVED] [NIT] "Edition 2021 initially. Edition 2024 once stable across the dependency graph" (design/30-crate-topology.md:189). Edition 2024 has been stable since Rust 1.85 (2025-02). Given pinned deps already require 1.85, just go edition 2024.

design/40-protocol-invariants.md

  • [RESOLVED] [BLOCKER] Reference registration request prefix is severely under-documented. Doc shows prefix cmd(1) + version(2) + itemHandle(i32) + correlation(GUID 16) + (-1 i16) + reservedByte + (1 i32) + itemDefinition… (design/40-protocol-invariants.md:172-180). Source NmxReferenceRegistrationMessage.cs:15 defines HeaderLength = 55, with explicit writes only at offsets 0,1,3,7,23,27 (src/MxNativeCodec/NmxReferenceRegistrationMessage.cs:80-87). Bytes 25-26 and 31-55 (≈26 bytes) are zero-initialized but never described. CLAUDE.md says "preserve unknown bytes"; the doc elides them. Rust port reading the spec will encode a 30-byte prefix instead of 55.
  • [RESOLVED] [BLOCKER] Write body common prefix is wrong. Doc (design/40-protocol-invariants.md:108) says "cmd(1) + version(2) + padding(2) + handle_projection(14)" and locates wireKind at offset 17. Source NmxWriteMessage.cs:11-13 has HandleProjectionOffset = 3, HandleProjectionLength = 14, KindOffset = 17 — so layout is cmd(1) + version(2) + handle_projection(14), no 2-byte padding. The 14 bytes are at offsets 3..17, leaving wireKind at 17. Doc inserts a phantom 2-byte padding.
  • [RESOLVED] [BLOCKER] String/DateTime write total size is wrong. Doc (design/40-protocol-invariants.md:118-119) says total = "26 + N". NmxWriteMessage.cs:150 computes KindOffset(17) + 1 + 4 + 4 + N + 14 + 4 = 44 + N. The 26+N figure is off by 18 bytes. Anyone implementing to spec will under-allocate.
  • [RESOLVED] [BLOCKER] Subscription array header offset width disagrees with the write encoder. Doc (design/40-protocol-invariants.md:120) says array layout is count(u16) + width(u16) + 2N + suffix. Encoder agrees: NmxWriteMessage.cs:181-182 writes count u16 at body[22], elementWidth u16 at body[24]. But the decoder at NmxSubscriptionMessage.cs:264-265 reads count as u16 at body+4 and elementWidth as i32 at body+6. Either the encoder or decoder is wrong, and the doc only captures one shape. This is a load-bearing inconsistency that the BoM doc must resolve, not paper over.
  • [RESOLVED] [BLOCKER] Boolean write value section description is wrong. Doc (design/40-protocol-invariants.md:114) says "1 byte (0xFF/0x00) + 3 reserved bytes". NmxWriteMessage.cs:257 encodes [0xff, 0xff, 0xff, 0x00] (true) or [0x00, 0xff, 0xff, 0x00] (false) — bytes 1 and 2 are 0xFF, not "reserved". Reserved implies don't-care; native sets them to 0xFF. CLAUDE.md mandates preserving unknown bytes; calling them "reserved" invites zeroing.
  • [RESOLVED] [BLOCKER] AdviseSupervisory and Advise share opcode 0x1f but the parser rejects plain Advise. Doc (design/40-protocol-invariants.md:97-99) lists three commands: Advise (plain) 0x1f / 37 bytes, AdviseSupervisory 0x1f / 39 bytes, UnAdvise 0x21 / 37 bytes. But NmxItemControlMessage.cs:46-49 rejects command is not (AdviseSupervisory or UnAdvise) — a 37-byte 0x1f message fails to parse. Either the doc must remove "plain Advise" or call out that the codec emits/accepts only the supervisory form (39 bytes). The duplicate enum entries Advise = 0x1f, AdviseSupervisory = 0x1f in the source signal this is unresolved (NmxItemControlMessage.cs:7-8).
  • [RESOLVED] [MAJOR] recordCount != 1 invariant for DataUpdate is missing. NmxSubscriptionMessage.cs:71-74 hard-throws if recordCount != 1 for the 0x33 DataUpdate frame. The doc treats recordCount(i32, typically 1) as a casual hint (design/40-protocol-invariants.md:161) and only mentions multi-record as "not yet wire-proven" (design/40-protocol-invariants.md:303). For a bill-of-materials this is an enforced invariant, not a soft expectation. The Rust port must replicate the throw to match parity.
  • [RESOLVED] [MAJOR] HRESULT 0x8007139F meaning conflicts across docs and elides the canonical name. Doc (design/40-protocol-invariants.md:272) says "Uninitialized object". design/50-error-model.md:133 maps it to EngineNotRegistered. docs/Capture-Run-2026-04-25.md:888 and docs/MXAccess-Public-API.md:326 document it as ERROR_INVALID_STATE, observed from ProcessActivateSuspend2. The 40-doc cites neither file and gives a folkloric description.
  • [RESOLVED] [MAJOR] Write2 timestamped suffix description is incoherent. Doc (design/40-protocol-invariants.md:131-132): "8-byte FILETIME inserted between offsets 12 and 19 of the suffix region". NmxWriteMessage.cs:240-251 WriteTimestampedSuffix actually replaces the 8-byte filler at suffix offsets 2..10 with the FILETIME (and changes the leading i16 from -1 to 0). The "between 12 and 19" wording does not match any offset in the source.
  • [RESOLVED] [MAJOR] tail value for item-control is not specified. NmxItemControlMessage.cs:88 defaults tail = 3 for advise/unadvise. The doc only describes shape (tail(4)) and never names the constant or its value (design/40-protocol-invariants.md:102). The Rust port will pick a different value and fail at the responding NMX. This is the kind of byte-exactness this BoM is supposed to nail down.
  • [RESOLVED] [MAJOR] Envelope ProtocolMarker described as 0x0201 int32 — but bytes 38-41 hold the LE u32. Doc (design/40-protocol-invariants.md:67) calls offset 38..42 a single i32 = 0x0201. NmxTransferEnvelope.cs:99 writes ProtocolMarker = 0x0201 as Int32LE at offset 38. So bytes are 01 02 00 00. The doc's claim is technically OK but the BoM should call out the wire-byte sequence to prevent endianness mistakes, since the value visually reads as a high byte 0x02. Minor relative to others but worth noting.
  • [RESOLVED] [MAJOR] Reserved offset 6 in the envelope is described as "preserved from observed (default 0)" but Parse does not retain it. Doc (design/40-protocol-invariants.md:59) promises round-trip, but NmxTransferEnvelope.cs:39-75 reads only Version, InnerLength, ProtocolMarker, etc., and discards bytes 6..10 entirely. The Encode path always writes 0 (line 91). This violates CLAUDE.md "preserve unknown bytes" and the doc misrepresents the implementation.
  • [RESOLVED] [MAJOR] DCE/RPC bind UUID and active-interface UUID lack file:line citations. Two of the most load-bearing values for activation (design/40-protocol-invariants.md:13-14) cite "docs/Loopback-Protocol-Findings.md" with no line number, while every other COM identifier in the same table has a :N citation. For a BoM this is a missing receipt.
  • [RESOLVED] [MAJOR] CRC-16 attribute signature initial value is "0" but the doc never references the spot in MxReferenceHandle.cs that proves 0 (vs e.g. 0xFFFF). Doc (design/40-protocol-invariants.md:90) asserts initial value 0. MxReferenceHandle.cs:51 does start with ushort crc = 0, so the claim is correct but the byte order step (low byte then high byte of UTF-16LE per char) is shown at lines 53-56 — these need explicit :LINE anchors, not just a range. Lines 108-119 are the inner CRC step, not the per-char loop.
  • [RESOLVED] [MAJOR] Status-detail table omits Source field and the Detail is short not byte. Doc (design/40-protocol-invariants.md:279-291) lists detail codes as bare integers. MxStatus.cs:32 declares Detail as short (signed 16-bit). The two callbacks (DataUpdate quality u16, status records i32) use different widths (NmxSubscriptionMessage.cs:126,132,136). Whether the wire is i32 (record status) or short (MxStatus.Detail) matters for sign extension on detail 8017. Doc collapses both.
  • [RESOLVED] [MINOR] DataUpdate record layout in doc claims quality(u16) + timestamp_filetime(i64) + wireKind(u8) + value(N) after status — confirmed by NmxSubscriptionMessage.cs:126-143 for hasDetailStatus=false. SubscriptionStatus claims status(i32) + detailStatus(i32) + quality(u16) + timestamp + wireKind + value, also confirmed. But DataUpdate header is cmd(1) + version(2) + recordCount(4) + operationId(16) = 23 bytes then records. SubscriptionStatus header is the same 23 bytes plus a per-message correlationId(16) at offset 23, records start at 39 (NmxSubscriptionMessage.cs:98-99). Doc shows the correlationId for SubscriptionStatus only on line 153, which is correct, but the BoM table lacks an explicit "header length 23 vs 39" which would help an implementer. NIT level since the byte math is correct.
  • [RESOLVED] [MINOR] Boolean array on the wire uses i16 elements (-1/0), not bool bytes. NmxWriteMessage.cs:307 writes (short)-1 for true. NmxSubscriptionMessage.cs:282 decodes width=sizeof(short). Doc (design/40-protocol-invariants.md:120) shows 2N for BoolArray which is consistent but doesn't say the encoding is 0xFFFF/0x0000 two's complement. Worth noting.
  • [RESOLVED] [MINOR] RegisterEngine opnum 3 has the same signature as INmxService::RegisterEngine — but the INmxService2 interface re-declares new versions of all base methods (NmxComContracts.cs:55-73). This means COM proxy/stub for INmxService2 exposes its own opnum table starting at 3, not inheriting opnums from INmxService. Doc (design/40-protocol-invariants.md:19) says "Opnums are sequential across the inheritance" — strictly speaking, with new declarations the C# vtable has its own slots. This needs a sentence saying "in the IDL these are sequential because INmxService2 : INmxService, but in the .NET interop they are re-declared new". Otherwise the Rust port may misinterpret what is bound.
  • [RESOLVED] [NIT] Reference Result message tail described as "tail(16 zero)" (design/40-protocol-invariants.md:191). NmxReferenceRegistrationResultMessage.cs not read but per CLAUDE.md should preserve verbatim — doc claim "all zero" needs a Frida-capture citation, not assertion.

design/50-error-model.md

  • [RESOLVED] [BLOCKER] 0x80004021 mapping is fabricated for "regular operations" (design/50-error-model.md:130). The .NET reference NEVER emits 0x80004021 from a non-secured path — it only appears in MxNativeSession.WriteSecuredAsync as the explanation for the NotSupportedException thrown locally before any wire call (src/MxNativeClient/MxNativeSession.cs:219-220). The "regular operation" stale-handle code observed across all captures is 0x80070057 E_INVALIDARG (e.g. captures/probe-add-remove.log:7, captures/008-write-test-int-same-value/harness.log:7, plus the LmxProxy.dll decompile returns 0x80070057 for stale handles at analysis/ghidra/exports/LmxProxy.dll.item-helper-decompile.md:60,75,88,164). The StaleItem arm of the split is unsupported by evidence; the design even contradicts itself one row below by mapping 0x80070057 to InvalidArgument. Pick one.
  • [RESOLVED] [BLOCKER] WriteSecuredForbidden cannot ever be returned by the Rust port as designed (design/50-error-model.md:101,129). The .NET reference does not surface 0x80004021 from a wire response — WriteSecuredAsync never reaches the wire, it throw new NotSupportedException(...) (src/MxNativeClient/MxNativeSession.cs:218-221). For Rust parity this should map to Error::Unsupported(...) at the API boundary, not a runtime HRESULT translation. The SecurityError::AccessDenied { detail } path (design/50-error-model.md:115) is also unreachable through the proven stack — captures 111/112 show the same 0x80004021 even after auth changes (captures/112-frida-write-secured-auth-verified-protectedvalue1/frida-events.tsv:69).
  • [RESOLVED] [MAJOR] 0x800706BA is not transient in the .NET reference — is_retryable=true is wrong (design/50-error-model.md:135,200). Every observed instance is a structural callback-marshalling failure, not a flapping NmxSvc: analysis/proxy/managed-registerengine2-callback-probe.txt:8, …-loopback-probe.txt:8, …-fixed-port-probe.txt:8 and docs/NMX-COM-Contracts.md:592 all describe it as the "no SYN to advertised port" outcome after security bindings are added — a config bug, not a retryable transient. Retrying loops forever.
  • [RESOLVED] [MAJOR] is_retryable() for Connection(TcpConnect) (design/50-error-model.md:188) glosses over io::ErrorKind::AddrNotAvailable / HostUnreachable / ConnectionRefused vs TimedOut. Retrying a name-not-found immediately is a hot-loop bug. ConnectionError::TcpConnect carries the io::Error (design/50-error-model.md:59) — is_retryable must inspect kind() like Io(_) does on line 206, not blanket-yes.
  • [RESOLVED] [MAJOR] MxStatusCategory and MxStatusSource are leaked into Error::Status without #[non_exhaustive] (design/50-error-model.md:45). They are plain Rust enums in the design but mirror short-valued .NET enums (src/MxNativeCodec/MxStatus.cs:3,17) where AVEVA could legally introduce new categories — Unknown=-1 already implies open-set. Without #[non_exhaustive], downstream match statements lock the API. Note also MxStatusSource exposes six values (RequestingLmxRespondingAutomationObject) — the Rust port should mirror these names exactly; the design never lists them.
  • [RESOLVED] [MAJOR] Error::Status { detail: i16 } drops MxStatus.Success (src/MxNativeCodec/MxStatus.cs:29). The .NET MxStatus is a 4-tuple (Success, Category, DetectedBy, Detail), and Success=-1 is the documented "OK" sentinel (MxStatus.cs:36-58). The Rust error model loses the Success short, breaking byte-parity diagnostics demanded by CLAUDE.md ("Preserve unknown bytes").
  • [RESOLVED] [MAJOR] ConfigurationError category is non-retryable per the recovery table (design/50-error-model.md:201), but detail=21 ("Invalid reference", src/MxNativeCodec/MxStatus.cs:76) is a cold-cache miss that becomes valid after Galaxy resolver refresh — the .NET path retries by re-resolving (MxNativeSession.ResolveTagAsync is called every operation, src/MxNativeClient/MxNativeSession.cs:173,196,232,255). Mapping the entire ConfigurationError category to is_retryable=false is too coarse.
  • [RESOLVED] [MAJOR] MxValueKind Debug derive is not guaranteed by the codec (design/50-error-model.md:30). The .NET enum MxValueKind (src/MxNativeCodec/MxValueKind.cs:3-18) is the spec; the Rust port must explicitly #[derive(Debug)] and the design's expected:?, actual:? format (line 29) requires it. Not a blocker if obeyed, but the design must state it — neither 10-raw-layer.md nor 50-error-model.md constrain the codec to derive Debug.
  • [RESOLVED] [MAJOR] RpcError is exposed in the public ConnectionError::OxidResolve { source: RpcError } (design/50-error-model.md:61) but RpcError is defined only in the raw layer (design/10-raw-layer.md:282-285) with no documented std::error::Error impl, no Display, and no #[non_exhaustive] declaration cited. #[source] requires std::error::Error. The design must promote RpcError to a public, stable, Error-implementing type or wrap it.
  • [RESOLVED] [MINOR] Cancellation policy on UnAdvise failure is unspecified (design/50-error-model.md:145). "Best-effort" with no statement about whether the failure is logged, traced, or silently dropped. The .NET reference logs to mx.unadvise.error (captures/probe-add-remove.log:7) — the Rust port should commit to tracing::warn! and say so.
  • [RESOLVED] [MINOR] Panic policy is incomplete (design/50-error-model.md:154-159). clippy::panic = deny does not cover unreachable!(), todo!(), indexing panics (a[i]), arithmetic overflow panics, or slice bounds. Add clippy::indexing_slicing, clippy::unreachable, clippy::todo, clippy::arithmetic_side_effects (or document why omitted). Test override "via #[cfg(test)]" is vague — #![cfg_attr(test, allow(clippy::unwrap_used))] is the actual pattern.
  • [RESOLVED] [NIT] 0x8001011D CallbackObjRefRejected mapping is supported but the cite is thin: it appears only in probe-narrative docs (docs/NMX-COM-Contracts.md:591), not as a logged HRESULT in captures/. The design should cite docs/NMX-COM-Contracts.md:590-594 directly so future maintainers can find it.

design/60-roadmap.md

  • [RESOLVED] [BLOCKER] M1 DoD claims "every Frida-captured write/advise/subscribe body in captures/0NN-frida-* round-trips byte-identical" but several captures contain unresolved native behaviour the .NET reference itself does not yet decode (design/60-roadmap.md:35). Evidence: captures/079-082, 094 are buffered-advise scenarios where work_remain.md:177-181 says the codec layout for buffered batches is unproven (provider only emits single-sample batches), and 70-risks-and-open-questions.md:21-26 (R2) explicitly defers buffered batch parity. 036 (single-token WriteSecured) returns 0x80004021 with no payload sent (R6, line 53-58). 077-078 (suspend/activate) trigger conditions are unknown (R5). Round-trip-against-themselves is achievable, but "byte-identical to the .NET reference's encode" is not, because the reference does not encode these flows.
  • [RESOLVED] [BLOCKER] M5 DoD requires the ASB type matrix to cover Boolean, Int32, Float, Double, String, DateTime, Duration, and the corresponding arrays "matching work_remain.md:108-113" — but work_remain.md:109 only proves "deployed array tags," not all eight scalar arrays (design/60-roadmap.md:71). The DoD over-states what the .NET reference has actually proven; less-common ASB types are explicitly deferred ("Remaining work is adding less common ASB types only as needed"). Citing the line in work_remain.md as authority for a stronger claim than the file makes is a dependency on unproven behaviour.
  • [RESOLVED] [BLOCKER] M6 DoD "cargo bench shows codec encode/decode latency comparable to or faster than the .NET reference" directly contradicts the explicit V1 non-goal "cargo bench numbers as gating criteria. We measure but don't gate beyond M6's loose acceptance bar." (design/60-roadmap.md:82 vs 60-roadmap.md:149). Also, "comparable to or faster than" has no defined comparison harness — the .NET reference has no microbench project. The DoD is unmeasurable as written. Same milestone also asserts "live subscribe under churn does not allocate per-message" with no allocation-counting tooling specified (compare to 70-risks-and-open-questions.md:102-108 R12 which only aims for "< 5 allocations per write at steady state").
  • [RESOLVED] [MAJOR] M2 DoD "non-zero partnerVersion against a running AVEVA install" is boolean-trivial and re-proves what docs/DotNet10-Native-Library-Plan.md:64-73 already established (partner_version=6) (design/60-roadmap.md:43). It does not exercise RegisterEngine2, callback OBJREF export, or actually receiving a status frame, despite the prose two lines above (line 41) listing those as in-scope. The DoD is weaker than what M2 promises to deliver and cannot detect regressions in the callback exporter — which is the hardest part of M2 per the .NET evidence (MxNativeClient had to hand-roll INmxSvcCallback/IRemUnknown).
  • [RESOLVED] [MAJOR] Sequencing claim "M5 can run in parallel with M3/M4" is partially false (design/60-roadmap.md:142,145). M5 says mxaccess::Session over AsbTransport (line 68) — the Session type, recovery policy, Stream<Item = DataChange>, and Transport trait are all defined in M4 (lines 55-60). M5 cannot land its public surface before M4 lands the Session abstraction. The "ASB does not depend on DCE/RPC" justification only covers the transport, not the shared Session. This contradicts 30-crate-topology.md:64-65 where mxaccess (top-level) sits below both transports.
  • [RESOLVED] [MAJOR] Cross-implementation parity assumes the .NET reference is correct, but work_remain.md:170-174 flags the completion-only byte mapping (0x00/0x41/0xEF) as unmapped and MXSTATUS_PROXY[] conversion as missing (design/60-roadmap.md:96-102). Any Rust port that "matches the .NET reference" inherits the same gap — passing parity tests does not mean correct behaviour. The roadmap should mark these as "preserved verbatim" rather than green-checked by parity. R3/R4 acknowledge this but the validation strategy section does not.
  • [RESOLVED] [MAJOR] Cross-platform drift: roadmap line 153 says "Linux is a stretch goal sitting behind feature flags" but 30-crate-topology.md:91 claims mxaccess-galaxy is "All Rust targets (TDS works cross-platform via tiberius)" and the codec is also "All Rust targets" (line 82), and 70-risks-and-open-questions.md:128-134 Q3 says "Linux-best-effort." Three documents, three different positions. The roadmap is the most restrictive; either it is wrong or the topology over-promises.
  • [RESOLVED] [MAJOR] Live-probe CI story is unspecified. M2/M3/M4/M5 DoDs all require a "running AVEVA install" + Galaxy DB + MX_LIVE env (60-roadmap.md:43,51,62,71,93-94). There is no description of how this CI lane is provisioned — AVEVA System Platform is a licensed Windows-only install with a SQL Galaxy. Without a hosted runner story, every milestone's DoD reduces to "the author ran it once on their workstation." This is not a regression-detecting test line.
  • [RESOLVED] [MINOR] M3 DoD cites captures/022-frida-int-write* and captures/077-frida-advise* for byte-identical comparison (60-roadmap.md:51), but 077 is 077-frida-suspend-advised-scanstate (per the directory listing) — a suspend capture, not an advise capture. The advise/subscribe captures are 058-060, 077 actually exercises the unproven Suspend path (R5). Citation appears wrong or aspirational.
  • [RESOLVED] [MINOR] M0 DoD includes cargo publish --dry-run -p mxaccess-codec (60-roadmap.md:16) but 30-crate-topology.md:269 and Q2 (70-risks-and-open-questions.md:120-126) flag that the license is unconfirmed and there is no LICENSE in the repo. cargo publish --dry-run will refuse without license + license-file resolved. M0 cannot complete cleanly until Q2 is settled, but the dependency table does not reference Q2.
  • [RESOLVED] [NIT] tests/fixtures/ populated from captures/ via "junction or copy" (60-roadmap.md:12, 30-crate-topology.md:29) — junctions on Windows do not survive git clone on Linux/macOS, breaking the cross-platform stretch goal at the fixture-loading layer. Pick one.

design/70-risks-and-open-questions.md

  • [RESOLVED] [BLOCKER] mxaccess-asb-soap framing risk is mis-scoped — the .NET reference uses NetTcpBinding with default BinaryMessageEncoder (.NET Binary XML / NBFX / [MC-NBFX]+[MC-NBFS]), NOT SOAP/XML. R1 talks about hand-rolling [MS-NMF] (~1500 LoC) but never lists implementing the binary dictionary/string-table codec, and 30-crate-topology.md:130 pins quick-xml for an XML parse path that does not exist on the wire. The risk register should record "no XML on the wire — must implement [MC-NBFX] decoder; quick-xml is the wrong dep" as its own R-item (design/70-risks-and-open-questions.md:7-18). Evidence: src/MxAsbClient/MxAsbDataClient.cs:663 new NetTcpBinding(SecurityMode.None) with no message-encoding override → WCF defaults to binary; design/30-crate-topology.md:130 lists quick-xml. Currently missing.
  • [RESOLVED] [BLOCKER] recordCount != 1 → throw invariant has no risk entry. src/MxNativeCodec/NmxSubscriptionMessage.cs:71-74 hard-rejects any DataUpdate with more than one record. R2 is about the missing fixture, but the much bigger risk — that the codec will panic in production the first time AVEVA emits a multi-record 0x33 — is unrecorded. Either lift the constraint defensively or add this as its own R-item with a "behavior on rejection" plan (design/70-risks-and-open-questions.md:20-26). Evidence: src/MxNativeCodec/NmxSubscriptionMessage.cs:71-74; design/40-protocol-invariants.md:303 acknowledges the gap but only as a fixture problem.
  • [RESOLVED] [BLOCKER] 0x80004021 → StaleItem mapping in 50-error-model.md:130 is unevidenced. R6 admits "the reason is unknown" for secured writes, then the error model casually maps the same HRESULT on regular operations to a brand-new StaleItem enum that no capture, decompile, or .NET reference produces. This is invented protocol behavior — directly violates CLAUDE.md "do not fabricate protocol behavior" and is not flagged (design/70-risks-and-open-questions.md:52-58). Evidence: design/50-error-model.md:130 — StaleItem synthesised; no fixture cited; design/40-protocol-invariants.md:270 only documents the secured-write path.
  • [RESOLVED] [MAJOR] R3/R4 "Settles when" depends on Ghidra output analysis/ghidra/aaDCT tables that the doc itself admits is "currently absent." work_remain.md:173-174 reports "Current available decompiled/Ghidra outputs did not expose a mapping table for completion-only bytes." There is no plan, owner, or alternate path (e.g. native callback frida capture mentioned in work_remain.md:170). This is indefinitely punted — should be relabeled "indefinitely deferred" like the OperationComplete row in the evidence-gaps table, and stripped of fake settle criteria (design/70-risks-and-open-questions.md:34, 42).
  • [RESOLVED] [MAJOR] No risk for Drop-time async cleanup hazards. Q6 covers Clone semantics but the design at 20-async-layer.md:70 and 50-error-model.md:145-146 openly admits drop fires tokio::spawn for UnAdvise/UnregisterEngine — that requires a Tokio runtime to be alive at drop time. Drop in a sync context (e.g. final teardown after Runtime::shutdown_timeout) will silently leak the unadvise/unregister and leak server-side handles in NMX. This is a correctness hazard and not in R1R12 (design/70-risks-and-open-questions.md:152-158). Evidence: design/00-overview.md:38 "No spawn from inside Drop" directly contradicts design/20-async-layer.md:70 and 50-error-model.md:145.
  • [RESOLVED] [MAJOR] Crypto/auth crate version-drift risk is missing. 30-crate-topology.md:130 pins rc4, sha-1, md-5, num-bigint — all of which are at minimum-maintenance / deprecation watchlist (rc4 is yanked-adjacent, sha-1 v0.10 is the last RustCrypto release with a deprecation warning). Combined with 10-raw-layer.md:252 "Do not pull ring — hand-roll MD4," the auth surface area depends on ~5 marginal crates. No R-item tracks "what happens when one of these is yanked / advisory'd in CI." Currently missing.
  • [RESOLVED] [MAJOR] R1 and R12 have inverted severity. R1 is "hand-roll [MS-NMF] reliable-session, NBFX/NBFS dictionary codec, DH key agreement (~1500 LoC, plus the entire WCF binary message encoder you forgot)" — that's the entire ASB data plane. R12 is BytesMut::with_capacity micro-optimization. Treating them as peer entries in an unsorted list misrepresents the project's blocker surface. The doc has no severity tier; introduce one or reorder (design/70-risks-and-open-questions.md:7, 102).
  • [RESOLVED] [MAJOR] Q1 says "sibling rust/" but no workspace exists. Glob of rust/ confirms zero files; CLAUDE.md itself hedges ("when it is started"). The "best answer" is presented as confirmed but is in fact still a question — and M0 cannot start without it. Either escalate to an explicit decision item with an owner, or stop calling it answered (design/70-risks-and-open-questions.md:112-118).
  • [RESOLVED] [MINOR] Q3 contradicts crate topology. Q3 says "Linux-best-effort … macOS unsupported in V1." 00-overview.md:35 describes "Windows-first, cross-platform-aware" with ASB SOAP framing portable. 30-crate-topology.md:130 lists no cfg(windows) gating on mxaccess-asb-soap deps. So either the soap crate compiles on Linux (contradicting Q3 best-effort) or it doesn't (contradicting the topology). Resolve in one place (design/70-risks-and-open-questions.md:128-134).
  • [RESOLVED] [MINOR] "No risk: NDR-bridge" and "No risk: custom TLB" are asserted with docs/Transport-Correlation.md and "the .NET reference" as citations but no :line numbers and no Ghidra/capture artifact. CLAUDE.md mandates evidence; the section that proudly declares non-issues is the only section in the file with weaker citations than the risks themselves (design/70-risks-and-open-questions.md:175-187).
  • [RESOLVED] [MINOR] Evidence-gaps table omits two fixtures present in work_remain.md: (a) source-level no-communication evidence (work_remain.md:198) and (b) live partial-cleanup behavior after channel failure (work_remain.md:196-197). Both are open in work_remain.md but missing from the gap table (design/70-risks-and-open-questions.md:164-171).