fe2a6db786
rust / build / test / clippy / fmt (push) Has been cancelled
Layout:
- src/ .NET 10 x64 reference: MxNativeCodec, MxNativeClient,
MxAsbClient, probes, tests, harnesses. Executable spec.
- design/ Architectural plan for the Rust port (M0–M6), error
model, protocol invariants, risks (R1–R16), adversarial
review log (review.md).
- rust/ Rust workspace. M0 skeleton + M1 codec parity.
mxaccess-codec: 215 unit tests + 2 cross-implementation
parity tests (byte-identical against .NET reference).
Other crates are M0 stubs awaiting M2+.
- captures/ Frida + netsh + pcap evidence per CLAUDE.md
("captures are evidence, not throwaway logs").
- analysis/ Decompiled C# (frida/proxy/decompiled-*),
Ghidra exports for native DLLs (`exports/` only —
working state at `projects/` and AVEVA's input
binaries at `input/` are gitignored).
- docs/ Reverse-engineering reference docs.
- tools/ Setup-LiveProbeEnv.ps1 (Infisical credential fetcher),
Compute-Crc.ps1 (.NET parity helper).
- .github/workflows/ Rust CI: fmt + build + test + clippy on Windows.
- LICENSE MIT (Joseph Doherty, 2026).
Verified:
- cargo test --workspace → 217 passed (215 unit + 2 .NET parity), 0 failed
- cargo clippy --workspace -- -D warnings → clean
- cargo fmt --all -- --check → clean
- cargo publish --dry-run -p mxaccess-codec → packages cleanly
Excluded from history (see .gitignore):
- **/bin, **/obj, **/target — build artifacts
- analysis/ghidra/projects/ — Ghidra working state (regenerable)
- analysis/ghidra/input/ — AVEVA proprietary DLLs (vendor IP)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
305 lines
16 KiB
Markdown
305 lines
16 KiB
Markdown
# Error model
|
|
|
|
## Goals
|
|
|
|
- No `HRESULT` bare integers in the public API.
|
|
- Every operation returns `Result<T, mxaccess::Error>`.
|
|
- `Error` is `#[non_exhaustive]` so adding variants is non-breaking.
|
|
- Every error variant carries enough context to debug without re-running.
|
|
- No panics in the public API.
|
|
|
|
## Public `mxaccess::Error`
|
|
|
|
```rust
|
|
#[derive(Debug, thiserror::Error)]
|
|
#[non_exhaustive]
|
|
pub enum Error {
|
|
#[error("connection: {0}")]
|
|
Connection(#[from] ConnectionError),
|
|
|
|
#[error("authentication: {0}")]
|
|
Auth(#[from] AuthError),
|
|
|
|
#[error("protocol: {0}")]
|
|
Protocol(#[from] ProtocolError),
|
|
|
|
#[error("configuration: {0}")]
|
|
Configuration(#[from] ConfigError),
|
|
|
|
#[error("type mismatch on {reference}: expected {expected:?}, got {actual:?}")]
|
|
TypeMismatch { reference: Arc<str>, expected: MxValueKind, actual: MxValueKind },
|
|
// The `{expected:?} {actual:?}` formatting above requires `MxValueKind: Debug`.
|
|
// The codec layer guarantees this: `MxValueKind`, `MxValue`, `MxStatus`,
|
|
// `MxStatusCategory`, and `MxStatusSource` all `#[derive(Debug)]` (this is a
|
|
// public-API requirement of the codec crate, not just a convenience). The
|
|
// .NET reference enum at src/MxNativeCodec/MxValueKind.cs:3-18 is the spec
|
|
// for the variants; the Rust port reproduces them and pins `Debug` as part
|
|
// of the contract so `Error::Display` and `tracing` fields render usefully.
|
|
|
|
#[error("security: {0}")]
|
|
Security(#[from] SecurityError),
|
|
|
|
#[error("unsupported on {transport:?} transport: {operation}")]
|
|
Unsupported { operation: Cow<'static, str>, transport: TransportKind },
|
|
|
|
#[error("operation timed out after {0:?}")]
|
|
Timeout(Duration),
|
|
|
|
#[error("operation cancelled")]
|
|
Cancelled,
|
|
|
|
#[error("status: success={success} category={category:?} source={source:?} detail={detail}")]
|
|
Status {
|
|
/// `MxStatus.Success` (`short`). `-1` is the documented OK sentinel
|
|
/// (src/MxNativeCodec/MxStatus.cs:29,36-58). Carried verbatim for
|
|
/// byte-parity diagnostics; consumers usually inspect `category` first.
|
|
success: i16,
|
|
category: MxStatusCategory,
|
|
source: MxStatusSource,
|
|
detail: i16,
|
|
},
|
|
|
|
#[error("io: {0}")]
|
|
Io(#[from] std::io::Error),
|
|
}
|
|
```
|
|
|
|
Sub-errors (`ConnectionError`, `AuthError`, `ProtocolError`, `ConfigError`, `SecurityError`) are similar `thiserror`-derived `#[non_exhaustive]` enums. Sources preserved via `#[source]`.
|
|
|
|
`MxStatusCategory` and `MxStatusSource` (re-exported from `mxaccess-codec`) are likewise `#[non_exhaustive]` — AVEVA may legally introduce new short-valued category/source variants, and the .NET reference already has `Unknown=-1` open-set sentinels (src/MxNativeCodec/MxStatus.cs:3,17). Without `#[non_exhaustive]`, downstream `match` statements would freeze the API on the first new category. The Rust port mirrors the **seven** `MxStatusSource` values one-for-one as named in the .NET enum — `Unknown=-1, RequestingLmx=0, RespondingLmx=1, RequestingNmx=2, RespondingNmx=3, RequestingAutomationObject=4, RespondingAutomationObject=5` (src/MxNativeCodec/MxStatus.cs:17-26). `DetectedBy` is essential for diagnostics: it tells the consumer which layer (requesting LMX vs responding NMX vs the automation object itself) detected the fault. Verified against wwtools/mxaccesscli/docs/api-notes.md:107-136 — every `MxStatusInfo[]` entry exposes `DetectedBy`.
|
|
|
|
```rust
|
|
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
|
|
#[non_exhaustive]
|
|
#[repr(i16)]
|
|
pub enum MxStatusCategory {
|
|
Unknown = -1,
|
|
Ok = 0,
|
|
Pending = 1,
|
|
Warning = 2,
|
|
CommunicationError = 3,
|
|
ConfigurationError = 4,
|
|
OperationalError = 5,
|
|
SecurityError = 6,
|
|
SoftwareError = 7,
|
|
OtherError = 8,
|
|
}
|
|
|
|
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
|
|
#[non_exhaustive]
|
|
#[repr(i16)]
|
|
pub enum MxStatusSource {
|
|
Unknown = -1,
|
|
RequestingLmx = 0,
|
|
RespondingLmx = 1,
|
|
RequestingNmx = 2,
|
|
RespondingNmx = 3,
|
|
RequestingAutomationObject = 4,
|
|
RespondingAutomationObject = 5,
|
|
}
|
|
```
|
|
|
|
```rust
|
|
#[derive(Debug, thiserror::Error)]
|
|
#[non_exhaustive]
|
|
pub enum ConnectionError {
|
|
#[error("tcp connect to {addr} failed")]
|
|
TcpConnect { addr: SocketAddr, #[source] source: std::io::Error },
|
|
#[error("OXID resolve failed")]
|
|
OxidResolve { #[source] source: RpcError },
|
|
// `RpcError` is a public, stable, `std::error::Error`-implementing type
|
|
// defined in the `mxaccess-rpc` crate (per the topology in 30-crate-topology.md)
|
|
// and re-exported here. Its declaration is:
|
|
//
|
|
// #[derive(Debug, thiserror::Error)]
|
|
// #[non_exhaustive]
|
|
// pub enum RpcError { /* DCE/RPC framing/PDU/NDR variants */ }
|
|
//
|
|
// Required because `#[source]` on the `OxidResolve` variant above demands
|
|
// `std::error::Error + 'static`. The variant set is documented in the raw
|
|
// layer; this crate guarantees only that `RpcError: std::error::Error +
|
|
// Send + Sync + Debug + Display + 'static` and is `#[non_exhaustive]` so
|
|
// new RPC failure modes can be added without a breaking change.
|
|
#[error("RPC server unavailable")]
|
|
ServerUnavailable,
|
|
#[error("callback proxy/stub not registered (REGDB_E_CLASSNOTREG)")]
|
|
CallbackProxyMissing,
|
|
#[error("engine not registered (UninitializedObject)")]
|
|
EngineNotRegistered,
|
|
}
|
|
|
|
#[derive(Debug, thiserror::Error)]
|
|
#[non_exhaustive]
|
|
pub enum AuthError {
|
|
#[error("NTLM rejected: {reason}")]
|
|
Ntlm { reason: String },
|
|
#[error("ASB DH handshake failed: {reason}")]
|
|
AsbHandshake { reason: String },
|
|
#[error("DPAPI shared secret unavailable: {reason}")]
|
|
SharedSecret { reason: String },
|
|
}
|
|
|
|
#[derive(Debug, thiserror::Error)]
|
|
#[non_exhaustive]
|
|
pub enum ProtocolError {
|
|
#[error("decode at offset {offset} ({reason}); buffer len {buffer_len}")]
|
|
Decode { offset: usize, reason: &'static str, buffer_len: usize },
|
|
#[error("inner length {declared} does not match body length {actual}")]
|
|
InnerLengthMismatch { declared: i32, actual: usize },
|
|
#[error("unexpected opcode {0:#x}")]
|
|
UnexpectedOpcode(u8),
|
|
#[error("envelope mismatch: {reason}")]
|
|
EnvelopeMismatch { reason: &'static str },
|
|
}
|
|
|
|
#[derive(Debug, thiserror::Error)]
|
|
#[non_exhaustive]
|
|
pub enum ConfigError {
|
|
/// Covers both ill-formed arguments and stale/unknown item handles.
|
|
/// The proven x86 stack returns `0x80070057 E_INVALIDARG` for stale
|
|
/// handles (captures/008-write-test-int-same-value/harness.log:7,
|
|
/// analysis/ghidra/exports/LmxProxy.dll.item-helper-decompile.md:60,75,88,164).
|
|
/// Discriminate via `detail` if a finer split is justified by future evidence.
|
|
#[error("invalid argument: {detail}")]
|
|
InvalidArgument { detail: String },
|
|
#[error("galaxy resolver: {reason}")]
|
|
Galaxy { reason: String },
|
|
}
|
|
|
|
#[derive(Debug, thiserror::Error)]
|
|
#[non_exhaustive]
|
|
pub enum SecurityError {
|
|
#[error("callback OBJREF rejected (HRESULT 0x8001011D)")]
|
|
CallbackObjRefRejected,
|
|
#[error("verifier user token required for secured write")]
|
|
VerifierRequired,
|
|
}
|
|
```
|
|
|
|
## Mapping rules
|
|
|
|
The async layer translates raw protocol outcomes into typed errors at exactly one place: at the boundary where a `Result<_, RawError>` from `mxaccess-nmx` or `mxaccess-asb` becomes a `Result<_, Error>` exposed to the consumer.
|
|
|
|
| Source | Maps to |
|
|
|---|---|
|
|
| TCP connect failure | `Error::Connection(ConnectionError::TcpConnect { addr, source })` |
|
|
| OXID resolve failure | `Error::Connection(ConnectionError::OxidResolve { source })` |
|
|
| NTLM Type1/Type3 reject | `Error::Auth(AuthError::Ntlm { reason })` |
|
|
| DH handshake mismatch (ASB) | `Error::Auth(AuthError::AsbHandshake { reason })` |
|
|
| HRESULT `0x80070057` (incl. stale/unknown item handles) | `Error::Configuration(ConfigError::InvalidArgument { detail })` |
|
|
| HRESULT `0x80040154` | `Error::Connection(ConnectionError::CallbackProxyMissing)` |
|
|
| HRESULT `0x8007139F` | `Error::Connection(ConnectionError::EngineNotRegistered)` |
|
|
| HRESULT `0x8001011D` | `Error::Security(SecurityError::CallbackObjRefRejected)` (citation: `docs/NMX-COM-Contracts.md:590-594`, the full surrounding paragraph; no capture currently logs this HRESULT, so promotion to a typed error is based on probe analysis rather than a recorded wire fixture — re-examine if a future capture contradicts the narrative) |
|
|
| HRESULT `0x800706BA` | `Error::Connection(ConnectionError::ServerUnavailable)` |
|
|
| HRESULT `0x80004021` from a Write operation | preserved verbatim as `Error::Status { ..raw HRESULT carried }` — no synthesized typed error. Note: this HRESULT was previously mapped to a synthesized "single-token `WriteSecured` not supported" error; that mapping has been **removed** per `wwtools/mxaccesscli/` verification. Real MxAccess `WriteSecured` always takes two user ids (`(currentUserId, verifierUserId, value)`) and the LMX proxy CLI exposes single-user secured writes as `currentUserId == verifierUserId`. The 0x80004021 in `MxNativeSession.WriteSecuredAsync` (src/MxNativeClient/MxNativeSession.cs:218-221) is a defect of that .NET reimplementation, not a real LMX constraint — see R6 in `70-risks-and-open-questions.md` |
|
|
| Decoder failure on inbound frame | `Error::Protocol(ProtocolError::Decode { ... })` |
|
|
| `MxStatus` with `category != Ok` from `read`/`write`/`subscribe` | `Error::Status { success, category, source, detail }` (the full 4-tuple from the .NET `MxStatus` record at src/MxNativeCodec/MxStatus.cs:28-33 — `success` is the documented OK sentinel `-1` and is carried for byte-parity diagnostics per CLAUDE.md "preserve unknown bytes") |
|
|
| `MxStatus` with `category != Ok` on a `DataChange` from a `Subscription` | **not** an error — set on `DataChange.status`, surfaced through the stream |
|
|
|
|
The split between operation-status-as-error and subscription-status-as-data is the only sensible mapping. A subscription frame with `category=Warning, detail=stale` is data the caller wants to receive and inspect; it is not an exception. An operation that returns `category=ConfigurationError, detail=21 (invalid reference)` is a failure to produce a value and should be surfaced as `Err`.
|
|
|
|
## Cancellation semantics
|
|
|
|
- **Drop**: dropping a future cancels it. The library does not `block_on` inside drop.
|
|
- **Subscription drop**: holds a `tokio::sync::oneshot::Sender<UnAdviseRequest>` to signal `UnAdvise` from the task that owns the connection. On drop, the sender is consumed; the connection task fires `UnAdvise` best-effort. The drop returns immediately; no async cleanup is awaited. If the best-effort `UnAdvise` fails (e.g. connection already gone), the failure is logged via `tracing::warn!` with the subscription handle and the underlying error. The drop completes regardless. (Log signature precedent: `captures/probe-add-remove.log:7`, which records the .NET `LMX_UnAdvise` per-handle warning emitted when the proxy reports a non-Ok status during teardown.)
|
|
- **Session shutdown**: `Session::shutdown(timeout)` returns `Ok(())` once `UnregisterEngine` completes or the timeout elapses. Drop without `shutdown` runs an unbounded best-effort cleanup spawn — fine for tests, suboptimal for production where `shutdown` is the recommendation.
|
|
- **`CancellationToken`**: long operations (`subscribe_buffered`, recovery, `connect`) accept an optional token. Cancellation surfaces as `Error::Cancelled`.
|
|
- **Timeout**: `tokio::time::timeout` works on every operation because `async fn`s are cancel-correct by construction.
|
|
|
|
## Panic policy
|
|
|
|
Public API panics only for invariants the type system enforces (e.g. an array length that cannot be wrong). In practice this means: **no `unwrap()`, `expect()`, or `panic!()` in non-test code paths**.
|
|
|
|
Lints:
|
|
- `clippy::unwrap_used = "deny"`
|
|
- `clippy::expect_used = "deny"`
|
|
- `clippy::panic = "deny"`
|
|
- `clippy::indexing_slicing = "deny"` — out-of-bounds `slice[i]` is a panic; codecs use `slice.get(i).ok_or(...)?` instead.
|
|
- `clippy::unreachable = "deny"` — `unreachable!()` is still a panic; force `Err(ProtocolError::...)` on supposedly-impossible decoder branches so the wire is never trusted to match the type system.
|
|
- `clippy::todo = "deny"` — `todo!()` must not ship in non-test code.
|
|
- `clippy::arithmetic_side_effects = "warn"` — overflow-on-debug is a real footgun, but the codec does legitimate `u8`/`u16` field arithmetic where the lint is noisy; `warn` keeps it visible without blocking honest cases. Use `wrapping_*`/`checked_*` explicitly when the intent is non-default.
|
|
|
|
Test modules opt out of the unwrap/expect/indexing rules via the actual pattern:
|
|
|
|
```rust
|
|
#![cfg_attr(test, allow(clippy::unwrap_used, clippy::expect_used, clippy::indexing_slicing))]
|
|
```
|
|
|
|
placed at the top of the test module (or `#[cfg(test)] mod tests { #![allow(...)] ... }` for inline test modules). CI rejects PRs that introduce any of the denied lints in non-test paths.
|
|
|
|
## Diagnostics surface
|
|
|
|
`Error::Display` is short and operator-friendly. `Error::source()` walks the cause chain. `Error::Debug` is verbose for logs.
|
|
|
|
```rust
|
|
match session.write("X.Y", v).await {
|
|
Ok(()) => {}
|
|
Err(e) => {
|
|
tracing::error!(error = ?e, error.source = ?e.source(), "write failed");
|
|
return Err(e);
|
|
}
|
|
}
|
|
```
|
|
|
|
For protocol-level decode failures, every `ProtocolError::Decode` carries:
|
|
- byte offset in the frame
|
|
- reason (`&'static str` — interned message)
|
|
- buffer length
|
|
|
|
This matches the .NET reference's tendency to throw `ArgumentException` with byte-offset messages.
|
|
|
|
## Recovery decisions
|
|
|
|
`Error::is_retryable() -> bool` informs the default `RecoveryPolicy`:
|
|
|
|
| Variant | Retryable? |
|
|
|---|---|
|
|
| `Connection(ServerUnavailable)` | no — `0x800706BA` in the proven stack is a structural callback-marshalling failure (no SYN to the advertised callback port after security bindings are added), not a flapping `NmxSvc.exe`. Retrying loops forever. Evidence: `analysis/proxy/managed-registerengine2-callback-probe.txt:8`, `docs/NMX-COM-Contracts.md:592`. |
|
|
| `Connection(TcpConnect)` | depends on the inner `io::Error.kind()` — same dispatch as `Io(_)` below: `WouldBlock`/`Interrupted`/`TimedOut`/`ConnectionReset` → yes; `AddrNotAvailable`/`HostUnreachable`/`ConnectionRefused` → no. Implementation: `is_retryable()`'s `Connection(TcpConnect { source, .. })` arm delegates to `source.kind()` rather than blanket-yes; otherwise a name-not-found hot-loops. |
|
|
| `Connection(OxidResolve)` | yes |
|
|
| `Connection(CallbackProxyMissing)` | no (config issue) |
|
|
| `Connection(EngineNotRegistered)` | yes (re-register) |
|
|
| `Auth(Ntlm)` | no (credentials are wrong) |
|
|
| `Auth(AsbHandshake)` | no |
|
|
| `Auth(SharedSecret)` | no |
|
|
| `Configuration(*)` | no |
|
|
| `TypeMismatch` | no |
|
|
| `Security(*)` | no |
|
|
| `Status` with `category=CommunicationError` | yes |
|
|
| `Status` with `category=Pending` | yes (with backoff) |
|
|
| `Status` with `category=ConfigurationError` | depends on `detail`: `detail == 21` ("Invalid reference", src/MxNativeCodec/MxStatus.cs:76) → yes, after a Galaxy-resolver-cache refresh — the .NET reference's per-operation `ResolveTagAsync` (src/MxNativeClient/MxNativeSession.cs:173,196,232,255) makes a stale-cache miss recoverable on the next attempt. All other `ConfigurationError` details → no. Implementation: `is_retryable()` matches the inner `detail` and returns `true` only for the resolver-refresh-eligible codes. |
|
|
| `Status` other categories | no by default; consumer can override |
|
|
| `Timeout` | yes |
|
|
| `Cancelled` | no |
|
|
| `Protocol(*)` | no — protocol-level decode bugs are not retryable |
|
|
| `Io(_)` | depends on `kind()`: `WouldBlock`/`Interrupted`/`TimedOut` → yes; everything else → no |
|
|
|
|
The default `RecoveryPolicy::exponential` calls `is_retryable()` before re-attempting and emits `RecoveryEvent::Failed { error, .. }` if not.
|
|
|
|
## Status as data on subscriptions
|
|
|
|
```rust
|
|
pub struct DataChange {
|
|
pub reference: Arc<str>,
|
|
pub value: MxValue,
|
|
pub quality: u16,
|
|
pub timestamp: SystemTime,
|
|
pub status: MxStatus,
|
|
}
|
|
```
|
|
|
|
Consumers inspect `status.category` to distinguish good/uncertain/bad data. The default `mxaccess` does not filter; if a consumer wants Ok-only data, they filter the stream themselves:
|
|
|
|
```rust
|
|
let mut sub = session.subscribe("X.Y").await?
|
|
.filter_map(|change| async move {
|
|
match change {
|
|
Ok(c) if c.status.category == MxStatusCategory::Ok => Some(Ok(c)),
|
|
Ok(_) => None, // drop non-Ok
|
|
Err(e) => Some(Err(e)),
|
|
}
|
|
});
|
|
```
|
|
|
|
This keeps `Error` reserved for actual failures and keeps stream items composable.
|