Layout:
- src/ .NET 10 x64 reference: MxNativeCodec, MxNativeClient,
MxAsbClient, probes, tests, harnesses. Executable spec.
- design/ Architectural plan for the Rust port (M0–M6), error
model, protocol invariants, risks (R1–R16), adversarial
review log (review.md).
- rust/ Rust workspace. M0 skeleton + M1 codec parity.
mxaccess-codec: 215 unit tests + 2 cross-implementation
parity tests (byte-identical against .NET reference).
Other crates are M0 stubs awaiting M2+.
- captures/ Frida + netsh + pcap evidence per CLAUDE.md
("captures are evidence, not throwaway logs").
- analysis/ Decompiled C# (frida/proxy/decompiled-*),
Ghidra exports for native DLLs (`exports/` only —
working state at `projects/` and AVEVA's input
binaries at `input/` are gitignored).
- docs/ Reverse-engineering reference docs.
- tools/ Setup-LiveProbeEnv.ps1 (Infisical credential fetcher),
Compute-Crc.ps1 (.NET parity helper).
- .github/workflows/ Rust CI: fmt + build + test + clippy on Windows.
- LICENSE MIT (Joseph Doherty, 2026).
Verified:
- cargo test --workspace → 217 passed (215 unit + 2 .NET parity), 0 failed
- cargo clippy --workspace -- -D warnings → clean
- cargo fmt --all -- --check → clean
- cargo publish --dry-run -p mxaccess-codec → packages cleanly
Excluded from history (see .gitignore):
- **/bin, **/obj, **/target — build artifacts
- analysis/ghidra/projects/ — Ghidra working state (regenerable)
- analysis/ghidra/input/ — AVEVA proprietary DLLs (vendor IP)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
Error model
Goals
- No
HRESULTbare integers in the public API. - Every operation returns
Result<T, mxaccess::Error>. Erroris#[non_exhaustive]so adding variants is non-breaking.- Every error variant carries enough context to debug without re-running.
- No panics in the public API.
Public mxaccess::Error
#[derive(Debug, thiserror::Error)]
#[non_exhaustive]
pub enum Error {
#[error("connection: {0}")]
Connection(#[from] ConnectionError),
#[error("authentication: {0}")]
Auth(#[from] AuthError),
#[error("protocol: {0}")]
Protocol(#[from] ProtocolError),
#[error("configuration: {0}")]
Configuration(#[from] ConfigError),
#[error("type mismatch on {reference}: expected {expected:?}, got {actual:?}")]
TypeMismatch { reference: Arc<str>, expected: MxValueKind, actual: MxValueKind },
// The `{expected:?} {actual:?}` formatting above requires `MxValueKind: Debug`.
// The codec layer guarantees this: `MxValueKind`, `MxValue`, `MxStatus`,
// `MxStatusCategory`, and `MxStatusSource` all `#[derive(Debug)]` (this is a
// public-API requirement of the codec crate, not just a convenience). The
// .NET reference enum at src/MxNativeCodec/MxValueKind.cs:3-18 is the spec
// for the variants; the Rust port reproduces them and pins `Debug` as part
// of the contract so `Error::Display` and `tracing` fields render usefully.
#[error("security: {0}")]
Security(#[from] SecurityError),
#[error("unsupported on {transport:?} transport: {operation}")]
Unsupported { operation: Cow<'static, str>, transport: TransportKind },
#[error("operation timed out after {0:?}")]
Timeout(Duration),
#[error("operation cancelled")]
Cancelled,
#[error("status: success={success} category={category:?} source={source:?} detail={detail}")]
Status {
/// `MxStatus.Success` (`short`). `-1` is the documented OK sentinel
/// (src/MxNativeCodec/MxStatus.cs:29,36-58). Carried verbatim for
/// byte-parity diagnostics; consumers usually inspect `category` first.
success: i16,
category: MxStatusCategory,
source: MxStatusSource,
detail: i16,
},
#[error("io: {0}")]
Io(#[from] std::io::Error),
}
Sub-errors (ConnectionError, AuthError, ProtocolError, ConfigError, SecurityError) are similar thiserror-derived #[non_exhaustive] enums. Sources preserved via #[source].
MxStatusCategory and MxStatusSource (re-exported from mxaccess-codec) are likewise #[non_exhaustive] — AVEVA may legally introduce new short-valued category/source variants, and the .NET reference already has Unknown=-1 open-set sentinels (src/MxNativeCodec/MxStatus.cs:3,17). Without #[non_exhaustive], downstream match statements would freeze the API on the first new category. The Rust port mirrors the seven MxStatusSource values one-for-one as named in the .NET enum — Unknown=-1, RequestingLmx=0, RespondingLmx=1, RequestingNmx=2, RespondingNmx=3, RequestingAutomationObject=4, RespondingAutomationObject=5 (src/MxNativeCodec/MxStatus.cs:17-26). DetectedBy is essential for diagnostics: it tells the consumer which layer (requesting LMX vs responding NMX vs the automation object itself) detected the fault. Verified against wwtools/mxaccesscli/docs/api-notes.md:107-136 — every MxStatusInfo[] entry exposes DetectedBy.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
#[non_exhaustive]
#[repr(i16)]
pub enum MxStatusCategory {
Unknown = -1,
Ok = 0,
Pending = 1,
Warning = 2,
CommunicationError = 3,
ConfigurationError = 4,
OperationalError = 5,
SecurityError = 6,
SoftwareError = 7,
OtherError = 8,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
#[non_exhaustive]
#[repr(i16)]
pub enum MxStatusSource {
Unknown = -1,
RequestingLmx = 0,
RespondingLmx = 1,
RequestingNmx = 2,
RespondingNmx = 3,
RequestingAutomationObject = 4,
RespondingAutomationObject = 5,
}
#[derive(Debug, thiserror::Error)]
#[non_exhaustive]
pub enum ConnectionError {
#[error("tcp connect to {addr} failed")]
TcpConnect { addr: SocketAddr, #[source] source: std::io::Error },
#[error("OXID resolve failed")]
OxidResolve { #[source] source: RpcError },
// `RpcError` is a public, stable, `std::error::Error`-implementing type
// defined in the `mxaccess-rpc` crate (per the topology in 30-crate-topology.md)
// and re-exported here. Its declaration is:
//
// #[derive(Debug, thiserror::Error)]
// #[non_exhaustive]
// pub enum RpcError { /* DCE/RPC framing/PDU/NDR variants */ }
//
// Required because `#[source]` on the `OxidResolve` variant above demands
// `std::error::Error + 'static`. The variant set is documented in the raw
// layer; this crate guarantees only that `RpcError: std::error::Error +
// Send + Sync + Debug + Display + 'static` and is `#[non_exhaustive]` so
// new RPC failure modes can be added without a breaking change.
#[error("RPC server unavailable")]
ServerUnavailable,
#[error("callback proxy/stub not registered (REGDB_E_CLASSNOTREG)")]
CallbackProxyMissing,
#[error("engine not registered (UninitializedObject)")]
EngineNotRegistered,
}
#[derive(Debug, thiserror::Error)]
#[non_exhaustive]
pub enum AuthError {
#[error("NTLM rejected: {reason}")]
Ntlm { reason: String },
#[error("ASB DH handshake failed: {reason}")]
AsbHandshake { reason: String },
#[error("DPAPI shared secret unavailable: {reason}")]
SharedSecret { reason: String },
}
#[derive(Debug, thiserror::Error)]
#[non_exhaustive]
pub enum ProtocolError {
#[error("decode at offset {offset} ({reason}); buffer len {buffer_len}")]
Decode { offset: usize, reason: &'static str, buffer_len: usize },
#[error("inner length {declared} does not match body length {actual}")]
InnerLengthMismatch { declared: i32, actual: usize },
#[error("unexpected opcode {0:#x}")]
UnexpectedOpcode(u8),
#[error("envelope mismatch: {reason}")]
EnvelopeMismatch { reason: &'static str },
}
#[derive(Debug, thiserror::Error)]
#[non_exhaustive]
pub enum ConfigError {
/// Covers both ill-formed arguments and stale/unknown item handles.
/// The proven x86 stack returns `0x80070057 E_INVALIDARG` for stale
/// handles (captures/008-write-test-int-same-value/harness.log:7,
/// analysis/ghidra/exports/LmxProxy.dll.item-helper-decompile.md:60,75,88,164).
/// Discriminate via `detail` if a finer split is justified by future evidence.
#[error("invalid argument: {detail}")]
InvalidArgument { detail: String },
#[error("galaxy resolver: {reason}")]
Galaxy { reason: String },
}
#[derive(Debug, thiserror::Error)]
#[non_exhaustive]
pub enum SecurityError {
#[error("callback OBJREF rejected (HRESULT 0x8001011D)")]
CallbackObjRefRejected,
#[error("verifier user token required for secured write")]
VerifierRequired,
}
Mapping rules
The async layer translates raw protocol outcomes into typed errors at exactly one place: at the boundary where a Result<_, RawError> from mxaccess-nmx or mxaccess-asb becomes a Result<_, Error> exposed to the consumer.
| Source | Maps to |
|---|---|
| TCP connect failure | Error::Connection(ConnectionError::TcpConnect { addr, source }) |
| OXID resolve failure | Error::Connection(ConnectionError::OxidResolve { source }) |
| NTLM Type1/Type3 reject | Error::Auth(AuthError::Ntlm { reason }) |
| DH handshake mismatch (ASB) | Error::Auth(AuthError::AsbHandshake { reason }) |
HRESULT 0x80070057 (incl. stale/unknown item handles) |
Error::Configuration(ConfigError::InvalidArgument { detail }) |
HRESULT 0x80040154 |
Error::Connection(ConnectionError::CallbackProxyMissing) |
HRESULT 0x8007139F |
Error::Connection(ConnectionError::EngineNotRegistered) |
HRESULT 0x8001011D |
Error::Security(SecurityError::CallbackObjRefRejected) (citation: docs/NMX-COM-Contracts.md:590-594, the full surrounding paragraph; no capture currently logs this HRESULT, so promotion to a typed error is based on probe analysis rather than a recorded wire fixture — re-examine if a future capture contradicts the narrative) |
HRESULT 0x800706BA |
Error::Connection(ConnectionError::ServerUnavailable) |
HRESULT 0x80004021 from a Write operation |
preserved verbatim as Error::Status { ..raw HRESULT carried } — no synthesized typed error. Note: this HRESULT was previously mapped to a synthesized "single-token WriteSecured not supported" error; that mapping has been removed per wwtools/mxaccesscli/ verification. Real MxAccess WriteSecured always takes two user ids ((currentUserId, verifierUserId, value)) and the LMX proxy CLI exposes single-user secured writes as currentUserId == verifierUserId. The 0x80004021 in MxNativeSession.WriteSecuredAsync (src/MxNativeClient/MxNativeSession.cs:218-221) is a defect of that .NET reimplementation, not a real LMX constraint — see R6 in 70-risks-and-open-questions.md |
| Decoder failure on inbound frame | Error::Protocol(ProtocolError::Decode { ... }) |
MxStatus with category != Ok from read/write/subscribe |
Error::Status { success, category, source, detail } (the full 4-tuple from the .NET MxStatus record at src/MxNativeCodec/MxStatus.cs:28-33 — success is the documented OK sentinel -1 and is carried for byte-parity diagnostics per CLAUDE.md "preserve unknown bytes") |
MxStatus with category != Ok on a DataChange from a Subscription |
not an error — set on DataChange.status, surfaced through the stream |
The split between operation-status-as-error and subscription-status-as-data is the only sensible mapping. A subscription frame with category=Warning, detail=stale is data the caller wants to receive and inspect; it is not an exception. An operation that returns category=ConfigurationError, detail=21 (invalid reference) is a failure to produce a value and should be surfaced as Err.
Cancellation semantics
- Drop: dropping a future cancels it. The library does not
block_oninside drop. - Subscription drop: holds a
tokio::sync::oneshot::Sender<UnAdviseRequest>to signalUnAdvisefrom the task that owns the connection. On drop, the sender is consumed; the connection task firesUnAdvisebest-effort. The drop returns immediately; no async cleanup is awaited. If the best-effortUnAdvisefails (e.g. connection already gone), the failure is logged viatracing::warn!with the subscription handle and the underlying error. The drop completes regardless. (Log signature precedent:captures/probe-add-remove.log:7, which records the .NETLMX_UnAdviseper-handle warning emitted when the proxy reports a non-Ok status during teardown.) - Session shutdown:
Session::shutdown(timeout)returnsOk(())onceUnregisterEnginecompletes or the timeout elapses. Drop withoutshutdownruns an unbounded best-effort cleanup spawn — fine for tests, suboptimal for production whereshutdownis the recommendation. CancellationToken: long operations (subscribe_buffered, recovery,connect) accept an optional token. Cancellation surfaces asError::Cancelled.- Timeout:
tokio::time::timeoutworks on every operation becauseasync fns are cancel-correct by construction.
Panic policy
Public API panics only for invariants the type system enforces (e.g. an array length that cannot be wrong). In practice this means: no unwrap(), expect(), or panic!() in non-test code paths.
Lints:
clippy::unwrap_used = "deny"clippy::expect_used = "deny"clippy::panic = "deny"clippy::indexing_slicing = "deny"— out-of-boundsslice[i]is a panic; codecs useslice.get(i).ok_or(...)?instead.clippy::unreachable = "deny"—unreachable!()is still a panic; forceErr(ProtocolError::...)on supposedly-impossible decoder branches so the wire is never trusted to match the type system.clippy::todo = "deny"—todo!()must not ship in non-test code.clippy::arithmetic_side_effects = "warn"— overflow-on-debug is a real footgun, but the codec does legitimateu8/u16field arithmetic where the lint is noisy;warnkeeps it visible without blocking honest cases. Usewrapping_*/checked_*explicitly when the intent is non-default.
Test modules opt out of the unwrap/expect/indexing rules via the actual pattern:
#![cfg_attr(test, allow(clippy::unwrap_used, clippy::expect_used, clippy::indexing_slicing))]
placed at the top of the test module (or #[cfg(test)] mod tests { #![allow(...)] ... } for inline test modules). CI rejects PRs that introduce any of the denied lints in non-test paths.
Diagnostics surface
Error::Display is short and operator-friendly. Error::source() walks the cause chain. Error::Debug is verbose for logs.
match session.write("X.Y", v).await {
Ok(()) => {}
Err(e) => {
tracing::error!(error = ?e, error.source = ?e.source(), "write failed");
return Err(e);
}
}
For protocol-level decode failures, every ProtocolError::Decode carries:
- byte offset in the frame
- reason (
&'static str— interned message) - buffer length
This matches the .NET reference's tendency to throw ArgumentException with byte-offset messages.
Recovery decisions
Error::is_retryable() -> bool informs the default RecoveryPolicy:
| Variant | Retryable? |
|---|---|
Connection(ServerUnavailable) |
no — 0x800706BA in the proven stack is a structural callback-marshalling failure (no SYN to the advertised callback port after security bindings are added), not a flapping NmxSvc.exe. Retrying loops forever. Evidence: analysis/proxy/managed-registerengine2-callback-probe.txt:8, docs/NMX-COM-Contracts.md:592. |
Connection(TcpConnect) |
depends on the inner io::Error.kind() — same dispatch as Io(_) below: WouldBlock/Interrupted/TimedOut/ConnectionReset → yes; AddrNotAvailable/HostUnreachable/ConnectionRefused → no. Implementation: is_retryable()'s Connection(TcpConnect { source, .. }) arm delegates to source.kind() rather than blanket-yes; otherwise a name-not-found hot-loops. |
Connection(OxidResolve) |
yes |
Connection(CallbackProxyMissing) |
no (config issue) |
Connection(EngineNotRegistered) |
yes (re-register) |
Auth(Ntlm) |
no (credentials are wrong) |
Auth(AsbHandshake) |
no |
Auth(SharedSecret) |
no |
Configuration(*) |
no |
TypeMismatch |
no |
Security(*) |
no |
Status with category=CommunicationError |
yes |
Status with category=Pending |
yes (with backoff) |
Status with category=ConfigurationError |
depends on detail: detail == 21 ("Invalid reference", src/MxNativeCodec/MxStatus.cs:76) → yes, after a Galaxy-resolver-cache refresh — the .NET reference's per-operation ResolveTagAsync (src/MxNativeClient/MxNativeSession.cs:173,196,232,255) makes a stale-cache miss recoverable on the next attempt. All other ConfigurationError details → no. Implementation: is_retryable() matches the inner detail and returns true only for the resolver-refresh-eligible codes. |
Status other categories |
no by default; consumer can override |
Timeout |
yes |
Cancelled |
no |
Protocol(*) |
no — protocol-level decode bugs are not retryable |
Io(_) |
depends on kind(): WouldBlock/Interrupted/TimedOut → yes; everything else → no |
The default RecoveryPolicy::exponential calls is_retryable() before re-attempting and emits RecoveryEvent::Failed { error, .. } if not.
Status as data on subscriptions
pub struct DataChange {
pub reference: Arc<str>,
pub value: MxValue,
pub quality: u16,
pub timestamp: SystemTime,
pub status: MxStatus,
}
Consumers inspect status.category to distinguish good/uncertain/bad data. The default mxaccess does not filter; if a consumer wants Ok-only data, they filter the stream themselves:
let mut sub = session.subscribe("X.Y").await?
.filter_map(|change| async move {
match change {
Ok(c) if c.status.category == MxStatusCategory::Ok => Some(Ok(c)),
Ok(_) => None, // drop non-Ok
Err(e) => Some(Err(e)),
}
});
This keeps Error reserved for actual failures and keeps stream items composable.