ddebab2c2d
Self-contained doc at docs/F3-cross-domain-ntlm-recipe.md for whoever picks F3 up on hardware with two AD forests + a forest trust. Covers: - Lab topology (LAB-A resource forest with AVEVA install + LAB-B account forest with the probe user, bidirectional forest trust). - DC + DNS + trust + user provisioning steps (Install-ADDSForest, Add-DnsServerConditionalForwarderZone, New-ADTrust, New-ADUser). - Capture procedure for both the Rust and .NET probes under a `runas /netonly` cross-domain token, with Wireshark NTLMSSP guidance. - Fixture layout under crates/mxaccess-rpc/tests/fixtures/cross-domain-ntlm/. - Round-trip test skeleton (replay the captured Type 2 → regenerate Type 3 → assert byte-equality against the captured Type 3). - Redaction checklist for the captured bytes. - Why F3 is "evidence work" not "codec work" — the AV pair parser is shape-agnostic, so the codec path is already correct; the fixture is a regression net for any future drift. F3 entry in design/followups.md and R8 in design/70-risks-and-open-questions.md both now point at the recipe so a future contributor doesn't have to reconstruct the lab topology from the followup analysis alone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
425 lines
46 KiB
Markdown
425 lines
46 KiB
Markdown
# Risks and open questions
|
||
|
||
This is the punch list of things that could break or are unproven. Each entry is tagged R(isk) or Q(uestion), with a current best answer and what would settle it.
|
||
|
||
## Protocol-level
|
||
|
||
### R1 — net.tcp / WCF framing **and binary message encoding** complexity
|
||
|
||
**Severity: P0** (project-blocker — entire ASB data plane, ~3000 LoC)
|
||
|
||
The .NET reference uses `System.ServiceModel.NetTcpBinding` for ASB (`src/MxAsbClient/MxAsbDataClient.cs:663`: `new NetTcpBinding(SecurityMode.None)` with no message-encoding override). With no override, WCF defaults to the **binary message encoder** — i.e. .NET Binary XML ([MC-NBFX]) with a static dictionary lookup ([MC-NBFS]) — **not** SOAP/XML. There is no Rust port of WCF, and `quick-xml` (or any other XML toolkit) is **not sufficient** to read or write these payloads: the body bytes are tokenised binary nodes that reference dictionary string IDs.
|
||
|
||
So the hand-rolled scope is two layers, not one:
|
||
|
||
1. **Framing** per [MS-NMF] (record types: preamble, preamble-ack, sized-envelope, end, fault) plus the reliable-session ack handling on the underlying `net.tcp` channel.
|
||
2. **Message encoding** per [MC-NBFX] (binary XML node tokens, length-prefixed strings, prefixed/typed attributes, end-element markers) **plus** [MC-NBFS] (the static dictionary that holds the SOAP/WS-Addressing/`IASBIDataV2`-action strings the encoder references by ID instead of inlining).
|
||
|
||
**Options:**
|
||
1. Hand-roll both framing ([MS-NMF]) and binary message encoding ([MC-NBFX] + [MC-NBFS]). Estimate ~3000 LoC across both layers (the encoder/dictionary work is the majority — framing alone is ~1500 LoC; the binary XML codec, dictionary tables, and operation-action mapping are roughly the same again).
|
||
2. Switch ASB to its HTTP variant if the deployed AVEVA instance supports it (this would let us use a normal text SOAP/XML stack and skip both [MS-NMF] and [MC-NBFX]/[MC-NBFS] entirely).
|
||
3. Wrap the .NET ASB DLL in a process and call it via stdin/stdout JSON-RPC.
|
||
|
||
**Current best answer:** option 1 (hand-roll both layers). The two specs are public, the encoder is deterministic, and the .NET reference's `AsbMessageDumpBehavior` already produces ground-truth byte vectors for the dictionary and operation set we use. `quick-xml` may help with any auxiliary text-XML the wider stack uses, but it cannot decode the binary-encoded message bodies — that requires the [MC-NBFX] + [MC-NBFS] codec.
|
||
|
||
**Settles when:** `mxaccess-asb-nettcp` parses every captured request/reply byte-identical to the .NET reference's `IClientChannel` payload dump for the proven type matrix, including correct dictionary-ID resolution and round-trip of every observed binary XML node tag.
|
||
|
||
### R2 — Buffered subscription multi-sample body **(settled per option (a) — codec change landed under F44)**
|
||
|
||
**Severity: P3** (settled — codec accepts multi-record DataUpdate)
|
||
|
||
**Status (2026-05-06): SETTLED PER OPTION (a) — multi-sample body observed; codec relaxed.**
|
||
|
||
`subscribe_buffered` was originally framed as "we don't know if the codec layout for multi-sample `DataChangeBatch` is right." A first verification pass against `wwtools/mxaccesscli/docs/api-notes.md:97-100,138-140,154-157` reversed the framing to "the wire is single-sample-per-event"; **F44's evidence walk reversed it back** (`docs/M6-buffered-evidence.md`).
|
||
|
||
`captures/094-frida-buffered-separate-writer/frida-events.tsv:145` (`2026-04-25T21:40:34.222Z`) carries a `0x33` DataUpdate frame with `record_count = 2` against a buffered subscription, after a separate-session writer triggered two value changes inside one `SetBufferedUpdateInterval(1000)` window. Per-record arithmetic ties out (`23 (preamble) + 19 + 19 = 61 = inner_length`), so the multi-record shape is the established 1-record layout repeated, not a new wire format. The .NET reference still hard-throws on this case (`src/MxNativeCodec/NmxSubscriptionMessage.cs:71-74`); the Rust codec deliberately diverges and decodes it.
|
||
|
||
The `OnBufferedDataChange` **public event shape** the wwtools api-notes describe (`hServer, hItem, MxDataType, value, quality, timestamp, statuses` — singular `value`) is correct. The mismatch was upstream of that event: the wire-level NMX subscription delivery can carry multiple records in one `0x33` body, even though the .NET compatibility server fans those out to one event each.
|
||
|
||
**Current best answer:** `mxaccess-codec` decodes `0x33` DataUpdate bodies of any positive `record_count`; `subscribe_buffered` continues to expose `Stream<Item = DataChange>`, fanning the records out one per Stream item. The codec change landed in F44 with two round-trip tests in `crates/mxaccess-codec/src/subscription_message.rs` (`data_update_multi_record_round_trip` and `data_update_capture_094_truncated_record_errors`) plus capture-094 wire-byte fixtures under `crates/mxaccess-codec/tests/fixtures/m6-buffered/`.
|
||
|
||
**Settles when:** ✅ settled per option (a). Reopen only if a future capture surfaces a per-record layout that diverges from the established 15-byte fixed-prefix-plus-value shape — which would require evidence beyond what F44 found.
|
||
|
||
### R3 — `OperationComplete` trigger unproven **(settled 2026-05-06 — Path A landed: synthesizer kernel + typed `OperationStatus` events ported)**
|
||
|
||
**Severity: P1** (was a blocker; settled per Path A — typed promotion landed via `MxStatus::from_packed_u32`)
|
||
|
||
**Status (2026-05-06): SETTLED PER PATH A.** The five-stage Ghidra walk that previously settled the verdict at "verbatim preserve" was extended with a sixth stage that found the actual byte→`MXSTATUS_PROXY` synthesizer. It is **`Lmx.dll!FUN_10100ce0`** — a single 4-byte u32 LE → `MxStatus` decoder used by every NMX-frame parser in `Lmx.dll`. Bit layout:
|
||
|
||
```
|
||
bit 31: success (-1 if set, 0 if clear)
|
||
bits 27..24: category (4 bits, masked by 0xF)
|
||
bits 23..20: detected_by (4 bits, masked by 0xF)
|
||
bits 15..0: detail (i16 — low 16 bits, signed)
|
||
bits 30..28, 19..16: reserved/padding
|
||
```
|
||
|
||
The Rust port now ships this kernel as [`MxStatus::from_packed_u32`] (and the inverse `to_packed_u32` for round-trip parity). `Session::operation_status_events()` emits typed [`OperationStatus`] events for every `0x32`/`0x33`-or-similar callback the wire delivers; the synthesizer is byte-deterministic and context-free, so the operation-tracking state machine the original verdict deferred is **not** required for the kernel itself. Per-operation context tracking (correlating completion frames back to outstanding writes/subscribes) is filed as a follow-up: see F54 below.
|
||
|
||
A second mapping was also ported: `MxStatus::from_nmx_response_code` covers the constructed-from-response-code path in `Lmx.dll!FUN_1010bd10:741-770` (`ScanOnDemandCallback::GetResponse`), which builds an `MxStatus` from a 1-byte NMX `responseCode` field when no payload status word is present. Six proven mappings: `0x01`/`0x02` → `(CommunicationError, RequestingNmx)`, `0x03` → `(ConfigurationError, RequestingNmx)`, `0x04` → `(ConfigurationError, RespondingNmx)`, `0x05` → `(CommunicationError, RespondingNmx)`, `0x1A` → `(CommunicationError, RequestingNmx)`. Unmapped codes return `None` and the consumer falls back to verbatim preservation per CLAUDE.md "Do not fabricate protocol behavior."
|
||
|
||
**What about the 1-byte completion frames `0x00`/`0x41`/`0xEF`?** Those are NOT decoded by `FUN_10100ce0` — they're a different wire field (the NMX operation-status callback payload, not the `INmxService.GetResponse2 responseCode` parameter). `Lmx.dll`'s decoder for those frames does not invoke any status-synthesis logic; they propagate as raw byte → `MxStatus { success: 0, Unknown, Unknown, detail: byte }`. The Rust port preserves this exactly. R4 is settled by the same fact (see below).
|
||
|
||
**Aside — the .NET-reference shim was always half-implemented.** Verified at `src/MxNativeClient/MxNativeCompatibilityServer.cs:756` + `src/MxNativeCodec/NmxOperationStatusMessage.cs:18`: `MxNativeCompatibilityServer` fires `WriteCompleted` only when `IsMxAccessWriteComplete` is true, which gates strictly on `Format == StatusWord && StatusCode == 0x8050 && CompletionCode == 0x00` — i.e. the one exact 5-byte pattern `00 00 50 80 00` (= `MxStatus.WriteCompleteOk`). Every other completion frame (the 1-byte `0x00`/`0x41`/`0xEF` ones and any non-success status word) is silently dropped at the gate. The native consumer-facing `WriteCompleted` event has therefore **only ever fired for unambiguous successful writes** — failure outcomes have been invisible at the compatibility-shim layer for the entire history of the .NET reference. Path A's kernel (`from_packed_u32`) closes this asymmetry on the Rust side: `Session::operation_status_events()` exposes **all** typed outcomes the upstream synthesizer produces, not just the WriteCompleteOk slice. The Rust port now has strictly broader operation-status visibility than the .NET reference offered.
|
||
|
||
Logs:
|
||
- `analysis/ghidra/exports/Lmx.dll.aadct-decompile.md` — `aaDCT` symbol (stage 1)
|
||
- `analysis/ghidra/exports/LmxProxy.dll.completion-status-decompile.md` — Fire_* event handlers (stage 2)
|
||
- `analysis/ghidra/exports/LmxProxy.dll.fire-event-xrefs.md` — xrefs to Fire_* (stage 3)
|
||
- `analysis/ghidra/exports/LmxProxy.dll.status-synthesis-decompile.md` — Fire_* callers (stage 4)
|
||
- `analysis/ghidra/exports/LmxProxy.dll.mxstatus-safearray-decompile.md` — `FUN_10003f60` (stage 5)
|
||
- `analysis/ghidra/exports/Lmx.dll.set-attribute-result-decompile.md` — `PreboundReference::OnSetAttributeResult` (stage 6, entry to next ring)
|
||
- `analysis/ghidra/exports/Lmx.dll.set-attribute-result-xrefs.md` — xrefs to `OnSetAttributeResult`/`CancelWithStatus`/`OperationComplete` (next-ring discovery)
|
||
- `analysis/ghidra/exports/Lmx.dll.synthesizer-decompile.md` — `ScanOnDemandCallback::OperationComplete`/`MultipleOperationComplete` (`FUN_1010b990`), `RemotePlatformResolver::OperationComplete` (`FUN_1010dc80`), and the constructed-from-responseCode synthesizer `FUN_1010bd10` (lines 698-770)
|
||
- `analysis/ghidra/exports/Lmx.dll.synthesizer-helpers-decompile.md` — `FUN_10003fc0` (the `<success %d category %d ...>` formatter), `FUN_1008f150` (the dispatch helper), `PreboundReference` constructors
|
||
- `analysis/ghidra/exports/Lmx.dll.synthesizer-helpers2-decompile.md` — **the synthesizer kernel `FUN_10100ce0`** (4-byte u32 → `MxStatus` decoder), `FUN_10100bc0` (3×u16 reader), `FUN_1005e580` (4-byte stream reader), `FUN_1010ee00` (sister NMX-frame parser using the same kernel)
|
||
- `analysis/ghidra/exports/Lmx.dll.synthesizer-callers-xrefs.md` — caller graph for the synthesizer ring
|
||
|
||
Findings, layer by layer (the wire bytes flow inward; the synthesis flows outward):
|
||
|
||
1. **`Lmx.aaDCT`** at `0x10178fc0` is a `SysAllocString(L"Lmx.aaDCT")` into a global BSTR — a tracing category name, not a status-mapping table. No array / lookup logic.
|
||
2. **`MXSTATUS_PROXY`** (16 bytes, Pack=4) is a 4-field marshalled struct: `success: i16` at offset 0, `category: i16` at offset 4, `detectedBy: i16` at offset 8, `detail: i16` at offset 12. It is the *output* of synthesis, not a lookup-table entry.
|
||
3. **`LmxProxy.dll` Fire_* event handlers** (`FUN_10015f72`, `FUN_1001611f`, `FUN_10016271`, `FUN_100163c0`) take an *already-populated* `MXSTATUS_PROXY[]` and forward it through ATL connection-point dispatch. No synthesis here.
|
||
4. **`LmxProxy.dll` Fire_* callers** (`FUN_1001657f` for OnDataChange / OnBufferedDataChange, `FUN_10016b50` for OnWriteComplete, `FUN_10016d4b` for OperationComplete) call **`FUN_10003f60(out_safearray, in_status_ptr, count=1)`** which creates the SafeArray. `FUN_10003f60` is **a verbatim memcpy** of an existing 14-byte buffer into the SAFEARRAY data — no transformation.
|
||
5. **`Lmx.dll` `PreboundReference::OnSetAttributeResult`** (`FUN_10114a90`) — the CALLER of step 4's path — receives an already-populated `short *param_7` status buffer; synthesis is upstream of THIS function too.
|
||
6. **The synthesizer kernel itself**: **`Lmx.dll!FUN_10100ce0`** (see `analysis/ghidra/exports/Lmx.dll.synthesizer-helpers2-decompile.md`). A 4-byte u32 LE read from a stream → 4-tuple `MxStatus` decoder. Pure transformation, no operation-context dependency. Used by every NMX-frame parser in `Lmx.dll` (`FUN_1010bd10` `ScanOnDemandCallback::GetResponse`, `FUN_1010ee00` `AccessManager::ProcessNmxRequest`, `FUN_10110986`, etc.) — the upstream decoder reads the wire bytes, the kernel translates them.
|
||
7. **The constructed-when-no-bytes path**: when an NMX `responseCode != 0` arrives without a payload status word, `FUN_1010bd10:741-770` constructs an `MxStatus` from the responseCode itself via a fixed switch. Six proven response codes (1, 2, 3, 4, 5, 0x1A); see the table in the `MxStatus::from_nmx_response_code` doc.
|
||
|
||
**Path A landed.** The synthesizer kernel and the constructed-from-response-code switch were both portable as pure functions — no operation-tracking state machine required for the kernel itself, because `FUN_10100ce0` is byte-deterministic. Rust port:
|
||
|
||
- `mxaccess-codec::status::MxStatus::from_packed_u32(packed: u32) -> MxStatus` — the kernel.
|
||
- `mxaccess-codec::status::MxStatus::to_packed_u32() -> u32` — inverse, for round-trip parity.
|
||
- `mxaccess-codec::status::MxStatus::from_nmx_response_code(byte: u8) -> Option<MxStatus>` — the response-code switch.
|
||
- `mxaccess::OperationKind` + `mxaccess::OperationContext` types for future correlation work (per-operation tracking is filed as F54).
|
||
- `mxaccess::Session::operation_status_events()` returns `broadcast::Receiver<Arc<OperationStatus>>`; `operation_status_stream()` returns the `Stream<Item = Result<...>>` variant.
|
||
- `mxaccess::OperationStatus { raw, status, context, is_during_recovery }` — matches `MxNativeOperationStatusEvent` (`MxNativeSession.cs:73-78`) plus typed `MxStatus` promotion.
|
||
- The callback router (`session::callback_router`) now tries operation-status parsing first, mirroring `MxNativeSession.OnCallbackReceived:574`.
|
||
|
||
**What about the 1-byte completion frames `0x00`/`0x41`/`0xEF`?** They are NOT decoded by `FUN_10100ce0` (they're a different wire field at a different layer — the NMX operation-status callback payload, not the `INmxService.GetResponse2` responseCode parameter). Per CLAUDE.md "Do not fabricate protocol behavior" they continue to propagate as `MxStatus { success: 0, Unknown, Unknown, detail: byte }`. R4 is settled by the same fact.
|
||
|
||
**Current best answer:** Path A landed. `Session::operation_status_events()` emits typed `OperationStatus` events. The synthesizer kernel (`MxStatus::from_packed_u32`) is exposed for any consumer that holds a 4-byte packed status word (e.g. extracted from a subscription record's `status: i32` field). Per-operation context (correlating completion frames back to outstanding writes/subscribes) is the next step — filed as F54.
|
||
|
||
**Reopen when:** F54 lands per-operation correlation, or a future capture surfaces a fresh wire field whose synthesis logic doesn't reduce to `FUN_10100ce0` + `from_nmx_response_code` (no such field has been observed to date).
|
||
|
||
### R4 — Completion-only byte mapping **(settled 2026-05-06 — verbatim-preserve confirmed; synthesizer doesn't apply at this layer)**
|
||
|
||
**Severity: P1** (was a blocker; now settled per the same R3 Path A finding — by exclusion)
|
||
|
||
**Status (2026-05-06): SETTLED.** R3's Path A walk traced the byte→`MxStatus` synthesizer to **`Lmx.dll!FUN_10100ce0`**, a 4-byte u32 LE → `MxStatus` decoder. The 1-byte completion frames `0x00`, `0x41`, `0xEF` (`work_remain.md:164–174`) are NOT input to that decoder — they're a different wire field, observed at a different layer (the NMX operation-status callback payload, not the `INmxService.GetResponse2` responseCode parameter or any 4-byte packed status field). `Lmx.dll`'s decoder for the 1-byte completion-only inner body does not invoke any synthesis logic; the bytes propagate untransformed.
|
||
|
||
**Current best answer:** unchanged — preserve as `MxStatus { Success: 0, Category: Unknown, DetectedBy: Unknown, Detail: byte }`. `mxaccess-codec::NmxOperationStatusMessage::promote_to_typed` returns the verbatim placeholder for these frames; `mxaccess::Session::operation_status_events()` surfaces them via the typed `OperationStatus.status` field with the byte preserved in `detail`.
|
||
|
||
**Reopen when:** a fresh capture proves a synthesis rule for a specific 1-byte completion code under a specific operation context (e.g. via Frida pairs `LmxProxy.dll!FUN_10003f60` input vs. observed event payload). At that point file a sub-followup with the captured `(byte, context, observed status)` triple and decide whether to add a typed mapping.
|
||
|
||
### R5 — Activate / Suspend behaviour **(SETTLED 2026-05-06 — F50 live capture proves Suspend is server-side wire op `0x2D`; Activate against a non-suspended item is client-side only)**
|
||
|
||
**Severity: P3** (downgraded from P2 — wire behaviour now characterised, no implementation gap blocking M6 / V1 since `Session::suspend` / `Session::activate` aren't part of the public API today; if/when added, the `0x2D` opcode is the encoder target).
|
||
|
||
**Settled (2026-05-06):** F50 captured `123-frida-suspend-advised-instrumented/` and `124-frida-activate-advised-instrumented/`. See `docs/F50-suspend-activate-evidence.md` for the byte-level evidence. Summary:
|
||
- **Suspend** emits NMX `PutRequest` with command byte `0x2D` ~140ms after the LMX-proxy entry hook, body shape matches AdviseSupervisory's `<command:1> <version:2> <correlation_id:16> <body:22>` family.
|
||
- **Activate** (against a non-suspended item, the only scenario the harness sequences) returns synchronously client-side with no wire traffic; same client-side behaviour F44 documented for capture 077.
|
||
|
||
**Status (2026-05-06): PARTIALLY OBSERVED — Frida hooks ready, live capture pending.**
|
||
F44's evidence walk on
|
||
`captures/077-frida-suspend-advised-scanstate/` (per `docs/M6-buffered-evidence.md`)
|
||
documents:
|
||
|
||
- `Suspend` returns synchronously with `MxStatus.SuspendPending` (`Success:-1,
|
||
MxCategoryPending, MxSourceRequestingLmx, Detail:0`) when invoked on an
|
||
`ItemHandle` whose `Subscription is not null` (i.e. immediately after a
|
||
successful `Advise` / `AdviseSupervisory`).
|
||
- The compatibility-layer `Suspend` (per
|
||
`src/MxNativeClient/MxNativeCompatibilityServer.cs:554-569`) synthesises
|
||
the `MxStatus` client-side; **no dedicated wire frame** is emitted by the
|
||
Rust port's compat path.
|
||
|
||
What capture 077 could **not** answer: whether the production
|
||
`LmxProxy.dll` stack issues a separate ORPC method for `Suspend` / `Activate`
|
||
(e.g. an `ILMXProxyServer5` opnum) or also handles them client-side. Capture
|
||
077's Frida script did not hook
|
||
`LmxProxy.dll!CLMXProxyServer.Suspend`/`.Activate`, so the wire-side
|
||
behaviour is invisible.
|
||
|
||
**Next step — F46.** `analysis/frida/mx-nmx-trace.js` now carries
|
||
`Interceptor.attach` blocks for `LmxProxy.dll!CLMXProxyServer.Suspend`
|
||
(RVA `0x13d9c`, `FUN_10013d9c`) and `.Activate` (RVA `0x14028`,
|
||
`FUN_10014028`), emitting `mx.suspend.begin/end` and
|
||
`mx.activate.begin/end` events with the `MxStatus*` out-parameter
|
||
decoded as 4 × int16. No `Resume` / `Reactivate` symbols exist in
|
||
`LmxProxy.dll` — verified against
|
||
`analysis/ghidra/exports/LmxProxy.dll.ghidra.md` and the decompiled
|
||
`ILMXProxyServer5` / `ILMXProxyServer4` interfaces. R5 stays open
|
||
until a live re-run on the AVEVA host produces
|
||
`captures/NNN-frida-suspend-activate-instrumented/` per the procedure
|
||
documented at the top of `analysis/frida/mx-nmx-trace.js`.
|
||
|
||
**Current best answer:** expose `Session::suspend(item)` and
|
||
`Session::activate(item)` returning `Result<MxStatus, Error>`. The success
|
||
criteria match the .NET reference's client-side gating: the item must have
|
||
an active subscription. If F46's wire capture later proves the LMX proxy
|
||
issues a separate ORPC method, add the wire emission here in M6 follow-up.
|
||
Do not build callback-driven state transitions on top until F46 settles.
|
||
|
||
**Settles when:** F45 produces a Frida capture instrumenting
|
||
`LmxProxy.dll!CLMXProxyServer.Suspend` / `.Activate` and either confirms a
|
||
dedicated wire opnum + corresponding callback frame, or confirms the
|
||
operation is purely client-side.
|
||
|
||
### R6 — `0x80004021` in `MxNativeSession.WriteSecuredAsync` is a .NET-reference defect, not a real LMX constraint
|
||
|
||
**Severity: P3** (formerly P1 — downgraded after `wwtools/mxaccesscli/` verification)
|
||
|
||
Original framing of this risk asserted that "`WriteSecured` (without `2`) returns `0x80004021` before sending the body" and concluded the single-token form was deprecated or rejected at the wire. That framing was wrong. Verification against `wwtools/mxaccesscli/` (a working CLI built on the production `LMXProxyServerClass` 32-bit COM proxy, i.e. the actual MxAccess surface) establishes:
|
||
|
||
- The LMX `WriteSecured` ALWAYS takes **two** user ids: `(currentUserId, verifierUserId, value)` (`wwtools/mxaccesscli/docs/api-notes.md:60-72`, `wwtools/mxaccesscli/src/MxAccess.Cli/Mx/MxItem.cs:69-70`).
|
||
- "Single-user secured write" is the same API called with `currentUserId == verifierUserId` — it is **not** a separate API surface (`wwtools/mxaccesscli/src/MxAccess.Cli/Commands/WriteCommand.cs:151-155,196-199`).
|
||
- `WriteSecured2` adds a timestamp parameter; it does **not** add a second token. The 1-vs-2 distinction in this design's earlier drafts was a confusion between "with timestamp" (Write2 vs Write) and "two-token" (which is always true).
|
||
- The `0x80004021` failure observed in `src/MxNativeClient/MxNativeSession.cs:218-221` is therefore a defect of the .NET native reimplementation, not behaviour the LMX proxy itself produces.
|
||
|
||
**Current best answer:** `mxaccess` exposes `write_secured(reference, value, current_user_id, verifier_user_id)` (no timestamp) and `write_secured_at(reference, value, timestamp, current_user_id, verifier_user_id)` (with timestamp), matching `WriteSecured` and `WriteSecured2` respectively. Both always pass two user ids; callers performing single-user secured writes pass the same id twice. The `Error::Unsupported` mapping for "single-token form" has been removed from `50-error-model.md`.
|
||
|
||
**Settles when:** the `MxNativeSession.WriteSecuredAsync` defect is fixed in the .NET reference, or a captured frame shows the LMX proxy itself producing `0x80004021` on a `WriteSecured` call (which would resurrect the original framing). Default-positive: this likely settles silently as "not a real risk."
|
||
|
||
### R7 — Status mapping for non-success ASB cases
|
||
|
||
**Severity: P2** (nice-to-have / minor — unknown bytes preserved as raw)
|
||
|
||
`work_remain.md:132–143`: live probes have not yet exercised access-denied and no-communication on the current VM. The Rust port mirrors what the .NET reference proves; remaining ASB error/quality/detail bytes are preserved as raw and surfaced through `MxStatus.detail` until a safe live capture lands.
|
||
|
||
**Current best answer:** preserve unknown payloads. Document the gap.
|
||
|
||
**Settles when:** live capture against a configured access-denied tag and a no-communication endpoint produces the expected `MxStatus` shape.
|
||
|
||
## Implementation-level
|
||
|
||
### R8 — NTLMv2 cross-domain auth **(permanently deferred 2026-05-06 — external infrastructure gap)**
|
||
|
||
**Severity: P1** (significant blocker for cross-domain deployments — single-domain ships)
|
||
|
||
**Status (2026-05-06): PERMANENTLY DEFERRED.** The implementation already parses NTLM AV pairs per [MS-NLMP] §2.2.2.1, including the cross-domain AV pair shapes (`MsvAvDnsTreeName`, `MsvAvDnsComputerName` carry the trusted-domain DNS suffix instead of the local one). What's missing is the *live capture* needed to pin a regression fixture — and that requires a multi-domain Windows lab (e.g. `LAB-A` + `LAB-B` with cross-domain trust + an AVEVA install on `LAB-A` authenticating a `LAB-B`-domain user) which is not available on the dev host. Same external-infrastructure constraint as `F3` in `design/followups.md`. R8 is closed in the same sense F3 is closed — the implementation is in place per spec; only the evidence is gated on hardware that doesn't exist here.
|
||
|
||
Captured traffic is single-domain (local AVEVA install). Cross-domain NTLM exercises the AV pair codepaths but the bytes haven't been pinned.
|
||
|
||
**Current best answer:** the AV pair parser handles the cross-domain shape per [MS-NLMP] §2.2.2.1; document `mxaccess-rpc` as untested across domains in the README. The `mxaccess-rpc::ntlm` round-trip tests cover the single-domain shape; cross-domain rounds-trip through the same code path (the AV pair parser is shape-agnostic) but no live fixture pins it.
|
||
|
||
**Reopen when:** a multi-domain AVEVA test harness becomes available + a cross-domain probe runs successfully end-to-end with packet-integrity signatures verified. Until then, this risk is permanently deferred — same status pattern as F3. Self-contained provisioning recipe (lab topology, DC/DNS/trust setup, capture procedure, fixture layout, round-trip test skeleton) at `docs/F3-cross-domain-ntlm-recipe.md`.
|
||
|
||
### R9 — DPAPI dependency for ASB
|
||
|
||
**Severity: P2** (nice-to-have / minor — explicit `shared_secret` constructor is the escape hatch)
|
||
|
||
ASB shared-secret retrieval uses `ProtectedData.Unprotect` (LocalMachine scope). Linux has no DPAPI. There is no portable replacement; the secret is encrypted at rest with a Windows-specific KCV.
|
||
|
||
**Current best answer:** `mxaccess-asb` requires Windows for the credential read path. Provide an explicit `AsbCredentials::shared_secret(secret: &[u8])` constructor that bypasses DPAPI for tooling that has the secret in plaintext (e.g. CI tests, ops automation).
|
||
|
||
**Settles when:** never. DPAPI is not portable; the escape hatch is the explicit constructor.
|
||
|
||
### R10 — Galaxy SQL schema versioning
|
||
|
||
**Severity: P1** (significant blocker per affected feature — break-loud on mismatch)
|
||
|
||
The recursive CTE in `GalaxyRepositoryTagResolver.cs` assumes the current AVEVA schema. Older Galaxy versions may have different table layouts.
|
||
|
||
**Current best answer:** target the schema that ships with the AVEVA version `MxNativeClient` validates against. Document the expected schema version. Break loudly on mismatch (`ConfigError::Galaxy { reason }`).
|
||
|
||
**Settles when:** a multi-version test matrix is set up. Probably not in V1.
|
||
|
||
### R11 — x86 proxy/stub workaround
|
||
|
||
**Severity: P2** (nice-to-have / minor — integration test catches binding-shape drift)
|
||
|
||
`NmxSvcps.dll` is x86-only. The replacement strategy bypasses the in-proc proxy by speaking ORPC directly. This works because we control both Type1/Type3 marshalling and `RemQueryInterface`. But it depends on `NmxSvc` continuing to expose IPv4 NCACN_IP_TCP bindings via the OXID.
|
||
|
||
**Current best answer:** add an `mxaccess-rpc` integration test that asserts `ResolveOxid` returns at least one `ncacn_ip_tcp` binding. Fail fast if the binding shape changes in a future AVEVA release.
|
||
|
||
**Settles when:** that integration test is in CI gating.
|
||
|
||
### R12 — Performance — codec allocations
|
||
|
||
**Severity: P2** (nice-to-have / minor — micro-optimisation in M6)
|
||
|
||
The .NET reference reuses `byte[]` arrays via `MemoryPool`; the Rust port should use `bytes::Bytes` for zero-copy on receive and pre-allocate via `BytesMut` on encode. The codec currently allocates `Vec<u8>` per encode; tolerable for V1, worth optimising in M6.
|
||
|
||
**Current best answer:** use `BytesMut::with_capacity(MAX_FRAME)` per session. Bench in M6. Aim for < 5 allocations per write at steady state.
|
||
|
||
**Settles when:** `cargo bench` shows the target allocation count.
|
||
|
||
### R13 — DataUpdate `recordCount != 1` panic risk
|
||
|
||
**Severity: P1** (significant blocker for production stability — soft-error path documented)
|
||
|
||
`src/MxNativeCodec/NmxSubscriptionMessage.cs:71-74` hard-throws `ArgumentException` on any `0x33` DataUpdate whose `recordCount` is not exactly 1:
|
||
|
||
```csharp
|
||
if (recordCount != 1)
|
||
{
|
||
throw new ArgumentException("Observed NMX DataUpdate callback parser currently supports one record per body.", nameof(inner));
|
||
}
|
||
```
|
||
|
||
R2 covers the missing **fixture** for the multi-record case, but the bigger production-side risk is separate: the first time AVEVA emits a multi-record `0x33` against a deployed Rust client, the codec — if it ports the .NET behaviour faithfully — will panic / return a hard decode error and tear down the subscription. We have no fixture proving multi-record bodies don't happen on real installs; we only have evidence they haven't happened on **our** install.
|
||
|
||
**Options:**
|
||
1. Mirror the .NET reference and hard-error on `recordCount != 1`. Loud, but kills the session.
|
||
2. Surface as a typed soft error (e.g. `ProtocolError::Decode { reason: "multi-record DataUpdate not yet supported" }`), log at warn, and drop the frame. The subscription stays alive; the consumer sees a single missed update, not a teardown.
|
||
3. Speculatively decode multi-record (assume the per-record layout from the single-record case repeats) — explicitly forbidden by CLAUDE.md "Do not fabricate protocol behavior."
|
||
|
||
**Current best answer:** option 2 in Rust. Map the condition to `ProtocolError::Decode { reason: "multi-record DataUpdate not yet supported" }`, emit a `tracing::warn!` with the raw frame bytes attached as a hex field, and continue. Do **not** synthesise per-record decoding. The .NET-style hard throw stays as-is in the .NET reference (it is the executable spec, and a panic there is what produces the fixture we need — see R2). The Rust port deliberately diverges here on production-safety grounds, with the divergence documented in `50-error-model.md`.
|
||
|
||
**Settles when:** R2's multi-record fixture lands and the codec gains a proven typed decode path; then R13 collapses into "supported, no special handling" and the soft-error branch becomes dead code that can be removed.
|
||
|
||
### R14 — Fabricated `0x80004021 → StaleItem` mapping
|
||
|
||
**Severity: P1** (significant blocker — fabrication risk; corrected in `50-error-model.md`)
|
||
|
||
A draft of `50-error-model.md` mapped `HRESULT 0x80004021` to a typed `StaleItem` error category for regular (non-secured) operations. **This mapping is unevidenced.**
|
||
|
||
- R6 already covers `0x80004021` on secured-write specifically: per `wwtools/mxaccesscli/` verification, this is a `MxNativeSession.WriteSecuredAsync` defect (the .NET native reimplementation throws `NotSupportedException` before reaching the wire), **not** a real LMX-proxy constraint. The production LMX surface accepts `WriteSecured` with two user ids unconditionally. R6 explicitly does **not** generalise the .NET defect to a typed "stale" error.
|
||
- For regular operations, the actual stale-handle / invalid-arg HRESULT observed in captures is `0x80070057` (`E_INVALIDARG`). There is no captured frame, decompiled mapping table, or live probe in this repo that produces `0x80004021` on a non-secured path, and certainly none that justifies tagging it `StaleItem`.
|
||
|
||
This is a fabrication risk: the kind of "looks plausible from naming" mapping that CLAUDE.md "Do not fabricate protocol behavior" exists to prevent.
|
||
|
||
**Options:**
|
||
1. Drop the `StaleItem` category entirely. Regular-op `0x80004021`, if ever observed, falls through to the generic `Hresult { code, hint: None }` branch with the raw HRESULT preserved.
|
||
2. Keep `StaleItem` but rename the source HRESULT to `0x80070057` and require a captured fixture before promoting any frame to that category.
|
||
3. Keep the `0x80004021 → StaleItem` mapping. **Forbidden** — no evidence backs it.
|
||
|
||
**Current best answer:** option 1 for V1. Surface unknown HRESULTs as `Error::Hresult { code }` and let consumers match on the raw value. `50-error-model.md` is being corrected in parallel (review cluster 3) to remove the `StaleItem` reference; this risk register entry exists so the mistake is recorded for future contributors and not silently re-introduced when someone reaches for an ergonomic typed name.
|
||
|
||
**Settles when:** indefinitely deferred — no current artifact maps either `0x80004021` or `0x80070057` to a "stale handle" semantic, and inventing one violates the "don't fabricate protocol behaviour" rule. If a future capture or decompiled mapping table produces evidence, reopen as a typed-error proposal.
|
||
|
||
### R15 — Drop-time async cleanup hazards
|
||
|
||
**Severity: P1** (significant blocker — server-side handle leak on runtime shutdown)
|
||
|
||
`design/00-overview.md:38` states the principle "no spawn from inside Drop." `design/20-async-layer.md` and `design/50-error-model.md` describe Subscription drop semantics that fire `UnAdvise`/`UnregisterEngine` against the server. Reconciling these is non-trivial because:
|
||
|
||
- `tokio::spawn` from `Drop` panics if no Tokio runtime is current at drop time. A user dropping a `Session` from a `std::thread` after `Runtime::shutdown_timeout` returns will hit this.
|
||
- During `Runtime::shutdown_timeout`, spawned tasks are aborted before they can flush. Even if a runtime is current, spawning the cleanup from `Drop` does not guarantee the unadvise/unregister actually reaches the server — the drop call returns immediately and the spawned task may be cancelled before the bytes hit the wire.
|
||
- The result is a **server-side handle leak in `NmxSvc`**: subscriptions stay live, registered engines stay registered, until the TCP connection itself is torn down (which only happens once the kernel notices the socket is dead).
|
||
|
||
**Options:**
|
||
1. Best-effort `tokio::spawn` from `Drop`. Documented hazard. Leaks on runtime shutdown and panics on no-runtime.
|
||
2. Drop sends `UnAdvise`/`UnregisterEngine` via a `tokio::sync::oneshot` (or unbounded `mpsc`) to a long-lived connection task that owns the cleanup loop. **Drop itself never spawns** — it pushes a message onto the channel and returns. The connection task drains the channel until the TCP connection is itself dropped, at which point the server cleans up by socket close anyway.
|
||
3. Require the consumer to call `Session::shutdown(timeout).await` and document Drop as "best-effort, may leak under shutdown" — no automatic cleanup at all.
|
||
|
||
**Current best answer:** option 2. A long-lived connection task owns the cleanup channel and drains it; `Drop` pushes a `UnAdvise`/`UnregisterEngine` request onto a `tokio::sync::oneshot` (one per resource) or a per-connection unbounded `mpsc` and returns synchronously. This keeps `Drop` cheap, satisfies "no spawn from Drop," and gives the cleanup a reasonable best-effort guarantee while the connection task is alive. **Runtime-shutdown leak window remains** — if the connection task is itself aborted by `Runtime::shutdown_timeout` before draining the channel, the cleanup messages are dropped on the floor and the server-side handles remain registered until the TCP socket close is observed by `NmxSvc`. This window is documented in `50-error-model.md`'s cancellation semantics; consumers running under explicit shutdown should call `Session::shutdown(timeout).await` for deterministic cleanup. Cite `design/00-overview.md:38` (no-spawn-from-Drop principle), `design/20-async-layer.md` (Subscription drop semantics), `design/50-error-model.md` (cancellation semantics).
|
||
|
||
**Settles when:** the connection-task cleanup channel is implemented in M4, a stress test under churn confirms drop semantics on a live runtime do not leak, and the runtime-shutdown leak window is captured in a runnable test fixture (consumer drops `Session` after `Runtime::shutdown_timeout`; assert that the leak is bounded by socket-close timeout).
|
||
|
||
### R16 — Crypto/auth crate maintenance drift
|
||
|
||
**Severity: P1** (significant blocker — yank/advisory in CI breaks the build)
|
||
|
||
The auth surface area depends on a small cluster of marginal-maintenance crates. `design/30-crate-topology.md:130` pins `rc4`, `sha-1`, `md-5`, `num-bigint`; `design/10-raw-layer.md:252` instructs "Do not pull `ring` — hand-roll MD4." Of these:
|
||
|
||
- `rc4` is at minimum-maintenance, with a small maintainer pool and no recent releases.
|
||
- `sha-1` v0.10 is the last RustCrypto release that ships with a deprecation warning (the algorithm itself, not the crate's quality, is what's deprecated upstream).
|
||
- `md-5` and `num-bigint` are stable but not on the active-development frontier.
|
||
- The hand-rolled MD4 in `mxaccess-rpc` has no upstream at all — it lives in this repo.
|
||
|
||
The risk is that any one of these crates gets **yanked**, picks up an `RUSTSEC` advisory, or stops compiling against a future Rust toolchain, and `cargo-deny` (or `cargo audit`) in CI fails the build for everyone — without any actual bug being found in our usage. This is especially bad if it happens during a live release window.
|
||
|
||
**Options:**
|
||
1. Pin to known-good versions in workspace `Cargo.toml` and let CI break when an advisory lands. Triage manually.
|
||
2. Pin **and** subscribe to `cargo-deny` advisory feeds with a documented response process; pre-stage replacement plans for each crate (e.g. "if `rc4` is yanked, fall back to a hand-rolled cipher in `mxaccess-rpc::crypto::rc4` — RC4 is ~30 LoC and we already hand-roll MD4").
|
||
3. Hand-roll all of them up front (RC4, SHA-1, MD5, MD4 are all small) and depend on `num-bigint` only. Reduces the surface area to one external crate; increases the in-repo cryptographic LoC.
|
||
|
||
**Current best answer:** option 2 for V1. Pin to known-good versions in workspace `Cargo.toml`; subscribe `cargo-deny` advisories in CI; document a fallback plan per crate (hand-rolled RC4 if `rc4` is yanked, hand-rolled SHA-1/MD5 if `sha-1`/`md-5` are pulled, swap `num-bigint` for `crypto-bigint` if it's pulled). Reassess in M6 and consider option 3 (hand-roll-everything) if any of the pins fire during V1 development. Cite `design/30-crate-topology.md:130` and `design/10-raw-layer.md:252`.
|
||
|
||
**Settles when:** `cargo-deny check advisories` runs green in CI on a fresh advisory database, the workspace `Cargo.toml` pins are documented inline with their fallback plans, and a "yank rehearsal" (manually mark a pin as yanked locally and confirm the fallback compiles) has been done at least once per crate.
|
||
|
||
## Open questions
|
||
|
||
### Q1 — Where does the Rust workspace live? **(unresolved)**
|
||
|
||
`CLAUDE.md` proposes a sibling `rust/` directory at `c:\Users\dohertj2\Desktop\mxaccess\rust\`, but this is a *proposal*, not a confirmation: a glob of `rust/` confirms zero files exist there today, and `CLAUDE.md` itself hedges with "when it is started." **M0 cannot start until this is confirmed.**
|
||
|
||
**Owner:** project lead.
|
||
|
||
**Action:** confirm the path `c:\Users\dohertj2\Desktop\mxaccess\rust\` or pick an alternative location; create the empty `rust/` directory (or sibling) before M0 begins.
|
||
|
||
**Current best answer:** still pending. The CLAUDE.md proposal is the default and is what M0 will assume unless overridden, but treat this as an open decision rather than a confirmed answer.
|
||
|
||
**Settles when:** the workspace directory exists on disk and contains a `Cargo.toml` (even an empty one).
|
||
|
||
### Q2 — License? **(resolved: MIT)**
|
||
|
||
The .NET reference has no LICENSE file at the repo root. The Rust crates need one before publish.
|
||
|
||
**Resolved (2026-05-05):** **MIT** (single-license, not the dual `MIT OR Apache-2.0`). All workspace deps verified MIT/Apache-2.0 compatible; MIT alone satisfies every dep's downstream license obligation. `LICENSE` file added at the project root (`c:\Users\dohertj2\Desktop\mxaccess\LICENSE`). All crate `Cargo.toml`s set `license = "MIT"` via `workspace.package`.
|
||
|
||
**Settles when:** N/A — resolved.
|
||
|
||
### Q3 — Cross-platform reach (Linux, macOS)
|
||
|
||
The codec, ASB SOAP framing, and the async session are theoretically portable. Galaxy SQL via `tiberius` works on Linux. NTLM works on Linux. DPAPI does not. Active Directory authentication on Linux requires `gssapi` (Kerberos) which is out of scope.
|
||
|
||
**Current best answer:** Linux is a **stretch goal** for V1, not a supported target — consistent with `30-crate-topology.md`'s `mxaccess-codec` Targets line ("stretch goal") and `60-roadmap.md`'s "What this roadmap deliberately does not include" (Linux behind feature flags). If pursued, the path is `default-features = false` with the consumer providing credentials and shared secret explicitly. macOS unsupported in V1 (no Galaxy SQL TDS testing on macOS).
|
||
|
||
**Settles when:** a Linux integration test runs successfully against a remote AVEVA install. Until then, treat Linux support as aspirational and gate all Linux-specific code paths behind opt-in feature flags.
|
||
|
||
### Q4 — How does `mxaccess-compat` handle COM event sinks?
|
||
|
||
The .NET `MxNativeCompatibilityServer` raises `OnDataChange` etc. as COM events. `mxaccess-compat` is a Rust API; do we expose them as `Stream`s, callbacks, or both?
|
||
|
||
**Current best answer:** Streams, with a separate optional `mxaccess-compat-com` crate (post-V1) that registers `windows-rs`-generated COM classes. The compat crate's primary surface is Rust.
|
||
|
||
**Settles when:** a concrete consumer requests COM exposure.
|
||
|
||
### Q5 — How do we surface `MxStatus` in `Subscription` items vs `Session` operations?
|
||
|
||
For `Session::write()`, a non-Ok status maps to `Error::Status`. For `Subscription::next()`, a non-Ok status comes through as `DataChange { status: MxStatus, ... }` — it is not necessarily an error (a "stale" data change is still a valid frame).
|
||
|
||
**Current best answer:** `Session::write()` returns `Err` on non-Ok category. `Subscription::next()` returns `Ok(DataChange { ... })` and the consumer inspects `change.status`. Documented in `50-error-model.md`.
|
||
|
||
**Settles when:** API stabilises after consumer feedback.
|
||
|
||
### Q6 — Should `Session` be `Clone`?
|
||
|
||
Cheap clones via `Arc<SessionInner>` are convenient (handlers can take `Session` by value). But cloning makes shutdown semantics fuzzy: when does `UnregisterEngine` fire?
|
||
|
||
**Current best answer:** `Clone + Send + Sync`. Drop of the last clone runs `UnregisterEngine` best-effort via `tokio::spawn`. `Session::shutdown(timeout)` is the explicit, awaitable way for production code.
|
||
|
||
**Settles when:** stress test under churn confirms drop semantics are correct.
|
||
|
||
### Q7 — M1 `hasDetailStatus` audit
|
||
|
||
During M1 wave-1 codec ports, the `subscription_message.rs` agent draft conditionally read the `status: i32` field only when `hasDetailStatus = true`, while requiring a minimum record length of 15 (DataUpdate) regardless. The result: 4 leading status bytes were left unconsumed, then misread as `quality` further down. The defect was caught by round-trip tests (`data_update_boolean_round_trip`, `data_update_has_no_correlation_id`) and fixed: `status: i32` is now read unconditionally per `src/MxNativeCodec/NmxSubscriptionMessage.cs:126-127`; only `detail_status: Option<i32>` is gated on `hasDetailStatus` (`NmxSubscriptionMessage.cs:130-134`).
|
||
|
||
**Follow-up:** audit any other codec port (current or future) that takes a `has_detail_status` / `hasDetailStatus` parameter for the same defect pattern — specifically, verify that fields read unconditionally in the .NET source remain unconditional in the Rust port. Likely affected scope: any future helper that ports `ParseRecord` semantics from `NmxSubscriptionMessage.cs`. The inline note at `mxaccess-codec/src/subscription_message.rs` `parse_record` documents the fix.
|
||
|
||
**Settles when:** post-M1 audit confirms no other codec module conditionally skips fields the .NET reference reads unconditionally.
|
||
|
||
## Open evidence gaps
|
||
|
||
These are missing fixtures that the design assumes will land by their respective milestone.
|
||
|
||
| Fixture | Needed by | Captured how |
|
||
|---|---|---|
|
||
| ~~Multi-sample buffered batch~~ | ~~M6~~ | **CAPTURED (F44)** — `captures/094-frida-buffered-separate-writer/frida-events.tsv:145`; fixture under `crates/mxaccess-codec/tests/fixtures/m6-buffered/` |
|
||
| ~~Cross-domain NTLM Type1/2/3~~ | ~~M2+~~ | **DEFERRED (R8)** — permanently external-blocked; needs multi-domain Windows lab not available on this dev host |
|
||
| Activate/Suspend transition (wire) | M6 / F46 | **PARTIAL (F44 + F46)** — client-side conditions documented from capture 077; F46 added Frida hooks (`LmxProxy.dll!CLMXProxyServer.Suspend/.Activate` at RVAs `0x13d9c` / `0x14028`); live re-run pending (F50) |
|
||
| `OperationComplete` for non-write op | indefinitely | unknown |
|
||
| ~~Ghidra mapping table for completion-only bytes (R3/R4)~~ | ~~indefinitely~~ | **NO TABLE EXISTS (R3/R4 settled 2026-05-06)** — `analysis/ghidra/exports/Lmx.dll.aadct-decompile.md` confirms `aaDCT` is a logging BSTR name, not a table; `LmxProxy.dll`'s Fire_* event handlers receive already-populated `MXSTATUS_PROXY[]` from per-event context synthesis upstream, not from a static lookup. Verbatim preservation is the canonical answer. |
|
||
| ASB write timestamp + status fields | M5 | extended ASB Write/PublishWriteComplete probe |
|
||
| ASB no-communication source-level evidence (`work_remain.md:198`) | M5 | live capture against an unconfigured ASB endpoint |
|
||
| Partial-cleanup behavior after channel failure (`work_remain.md:196-197`) | M4/M5 | inject mid-flight failure during subscribe, observe cleanup state |
|
||
| Galaxy schema older version | indefinitely | not in scope for V1 |
|
||
|
||
## Things that look risky but aren't
|
||
|
||
### "Decode the NDR-bridge to find the value bytes"
|
||
|
||
`docs/Transport-Correlation.md:65-70` notes that distinct value probes do not appear in raw TCP — the `CNmxAdapter::PutRequest`/`CNmxAdapter::TransferData` buffers are an "internal adapter representation, not the TCP wire format." This is because the values flow as DCE/RPC stub bytes *inside* the `TransferData` payload, which itself is the 46-byte envelope plus the inner write/advise/subscribe body. The "bridge" is just our codec re-applied at a different boundary; once we encode the envelope correctly, the bytes are there.
|
||
|
||
The .NET reference confirms this — `src/MxNativeClient/ManagedNmxService2Client.cs:159-183` (`TransferData` + `ValidateTransferDataBody`) writes the 46-byte envelope directly into the DCE/RPC Request stub body, then forwards the inner; the validator explicitly rejects bodies that lack "an inner message after the 46-byte envelope" (line 182). There is no extra layer. The probe-vs-pcap mismatch is an artefact of not reassembling the inner body, not a missing protocol layer.
|
||
|
||
**No risk.** Documented for clarity so future contributors don't chase a non-existent encryption layer.
|
||
|
||
### "We need a custom TLB / proxy DLL"
|
||
|
||
The .NET reference avoids registering a custom TLB by hand-rolling the callback `IRemUnknown` server in `src/MxNativeClient/ManagedCallbackExporter.cs:44-54` (`CreateCallbackObjRef` builds an OBJREF in memory) plus `src/MxNativeClient/ManagedCallbackExporter.cs:164,195-196` (the `IRemUnknown::RemQueryInterface` server-side handler returns the negotiated `INmxSvcCallback` IPID without any registry-resident TLB or proxy/stub DLL). The Rust port does the same in `mxaccess-callback`. The only registry touchpoint is OXID resolution (read-only) and reading the ASB shared secret (read-only via DPAPI). No installer, no admin elevation.
|
||
|
||
**No risk.** Documented because it commonly comes up in DCOM contexts.
|