From bedad57b4e2611903aaa7f78fda30cae7fef0922 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Wed, 6 May 2026 04:09:03 -0400 Subject: [PATCH] design/followups: move F18 (M5 meta-tracker) to Resolved MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Trim the planning content (sub-stream breakdown table, parallel-safety analysis, risk-driven sequencing, "Resolves when" gate) since M5 is done. Keep the closure verdict, M5 DoD checklist showing the actual state at close, sub-followup closeout list (F19-F26 + F28/F29/F30/ F31/F32/F33/F34), cumulative execution log, and the architectural note explaining why AsbSession stays parallel to the NMX Session rather than unified — that's load-bearing for future maintenance. Open section now contains only F3 (cross-domain NTLM Type1/2/3 fixture, permanently external-blocked on this single-domain dev host — resolution requires multi-domain Windows lab not available here). Co-Authored-By: Claude Opus 4.7 (1M context) --- design/followups.md | 93 ++++++++++++--------------------------------- 1 file changed, 25 insertions(+), 68 deletions(-) diff --git a/design/followups.md b/design/followups.md index d74d2ae..c9b33f7 100644 --- a/design/followups.md +++ b/design/followups.md @@ -6,74 +6,6 @@ move to `## Resolved` with a date + commit hash. ## Open -### F18 — M5 plan of attack (ASB transport, parallel-safe sub-streams) -**Status:** Resolved 2026-05-06 — all sub-followups F19–F26 closed, M5 functionally LIVE end-to-end (`asb-subscribe` returns the real tag value over the wire). Kept under Open as the milestone-tracker reference for the cumulative execution log; the structural-port followups it spawned (F2/F10/F11/F27 for NTLM/DCOM/RemUnknown/DH) all resolved separately. **Severity:** P0 — milestone driver, blocks ASB consumers + V1 release -**Source:** `design/dependencies.md:73-89` + `design/60-roadmap.md:84-91` + `design/70-risks-and-open-questions.md:5-25` (R1 estimates ~3000 LoC for framing+encoders). - -**Scope.** Build the ASB data-plane end-to-end: -- `mxaccess-asb-nettcp` — `[MS-NMF]` framing + `[MC-NBFX]` binary-XML node codec + `[MC-NBFS]` static dictionary table + DH/HMAC/AES authentication crypto. -- `mxaccess-asb` — `IASBIDataV2` client (Connect, RegisterItems, Read, Write, PublishWriteComplete, CreateSubscription, AddMonitoredItems, Publish, Disconnect) + `SecretProvider` trait + DPAPI default impl + ASB Variant codec port (currently a stub at `crates/mxaccess-codec/src/lib.rs:74,77,80`). -- `mxaccess::Session` over an `AsbTransport` impl; capabilities surface ASB limits (no `subscribe_buffered`, no Activate/Suspend, no OperationComplete outside the proven write-completion frame — see `design/60-roadmap.md:88`). -- `examples/asb-subscribe.rs` exercises the whole path against a live ASB endpoint with parity vs `dotnet run --project src\MxAsbClient.Probe`. - -**Sub-stream breakdown** (matches `design/dependencies.md:78-89`). Each sub-stream is a separate followup so it can be claimed by a separate agent in a worktree without merge conflict: - -| Sub-followup | Stream | Owns | Depends on | -|---|---|---|---| -| F19 | (workspace prereq) | Add the M5 dep set to `rust/Cargo.toml` workspace deps + per-crate `Cargo.toml`: `aes`, `hmac`, `md-5`, `sha1`, `sha2`, `pbkdf2`, `flate2`, `rand`, `crypto-bigint` (constant-time DH per `review.md` MAJOR), `quick-xml`, `tokio-util`. Pinned to the `digest 0.11`/`cipher 0.5` generation per `design/30-crate-topology.md:251-289`. Sequential prereq for the others. | M0 | -| F20 | A — MS-NMF framing | `mxaccess-asb-nettcp::nmf` — preamble (`0x00 ver=1 mode=2 via=encoded-string`), preamble-ack, sized-envelope (`0x06 var-int len bytes`), end (`0x07`), fault (`0x08`), upgrade-request, known-encoding via lookup. Reliable-session ack handling. Round-trip against `analysis/proxy/mxasbclient-register-message.txt` and `mxasbclient-probe-stage*.txt` byte traces. | F19 | -| F21 | B — MC-NBFX | `mxaccess-asb-nettcp::nbfx` — record types (`0x40` ShortElement, `0x41` Element, `0x44` ShortDictionaryAttribute, `0x04` PrefixDictionary*A-Z, `0x84` BoolText, `0x88` Int32Text, `0x86` BoolFalseText, etc., per `[MC-NBFX]` §2.2). Length-prefixed strings (var-int 7-bit groups). Read/write over `bytes::BytesMut`. | F19 | -| F22 | C — MC-NBFS | `mxaccess-asb-nettcp::nbfs` — the static dictionary table. SOAP/WS-Addressing tokens + `IASBIDataV2`-action strings used by the operation set (`http://ASB.IDataV2:registerItemsIn`, `:readIn`, `:writeIn`, `:createSubscriptionIn`, `:publishIn`, etc., see `src/MxAsbClient/AsbContracts.cs:14-58`). Hand-rolled from the proven action set; the full WCF dictionary is much larger but only the action subset is on the wire. | F19 | -| F23 | D — Auth crypto | `mxaccess-asb-nettcp::auth` — port `src/MxAsbClient/AsbSystemAuthenticator.cs` (167 LoC): DH key exchange with `crypto-bigint` constant-time `mod_exp` (review.md MAJOR finding — .NET `BigInteger.ModPow` is **not** constant-time and the DH private exponent is long-lived per `cs:153-166`); HMAC-MD5/SHA1/SHA512 (negotiated per `AsbSolutionCryptoParameters.HashAlgorithm`); AES-128 with PBKDF2-SHA1 1000-iteration key derivation; deflate-then-encrypt `EncryptBaktun` vs raw-encrypt `EncryptApollo` distinguished by `:V2` lifetime suffix (`cs:48`); ASCII salt `"ArchestrAService"`; UTF-16LE passphrase. Plus DPAPI shared-secret read on Windows behind the existing `dpapi` feature gate, with a `SecretProvider::shared_secret(&[u8])` escape hatch for tests/CI (`design/30-crate-topology.md:150`). | F19 | -| F24 | (codec) | `mxaccess-codec::asb_variant` — fill in the stubbed `AsbVariant`, `AsbStatus`, `RuntimeValue` (`crates/mxaccess-codec/src/lib.rs:74,77,80`) per `docs/ASB-Variant-Wire-Format.md`. Decode/encode for the proven type matrix: `TypeBool`, `TypeInt32`, `TypeFloat`, `TypeDouble`, `TypeString`, `TypeDateTime`, `TypeDuration`, plus deployed array shapes (`work_remain.md:108-113`). Less-common scalars stay as raw bytes (matches .NET `DecodeVariant` fallback at `MxAsbDataClient.cs:748`). Independent of the framing/encoder work — separate crate. | M1 (envelope/status types) | -| F25 | E — IASBIDataV2 client | `mxaccess-asb::client` — top-level `AsbClient` with `connect`, `register_items`, `read`, `write`, `publish_write_complete`, `create_subscription`, `add_monitored_items`, `publish`, `disconnect`. Wires the contract → NBFX-encoded SOAP envelope → NMF-framed TCP. `ConnectedRequest::ConnectionValidator` HMAC signing per `AsbSystemAuthenticator::Sign`. Receives `Publish` callbacks via a long-lived background task (mirrors the M4 NMX `callback_router` pattern). Depends on F20+F21+F22+F23+F24. | A+B+C+D+codec | -| F26 | (session) | `mxaccess::Session` over `AsbTransport`. New transport impl alongside `NmxTransport`. Surface ASB capability flags so `subscribe_buffered`/`activate`/`suspend` return `Error::Unsupported(Capability::*)` rather than a runtime fallthrough. Update `examples/asb-subscribe.rs` to drive the path end-to-end. Live-probe DoD: round-trip parity with `dotnet run --project src\MxAsbClient.Probe`. | F25 | - -**Parallel-safety analysis.** -- F19 (workspace deps) is the **single sequential bottleneck** — F20-F25 all reference workspace deps that don't exist yet, so they cannot start in parallel until F19 lands. Tight & small (~30 lines of TOML). -- F20, F21, F22, F23, F24 are **fully parallel-safe** after F19: each owns a different module under a different crate (or different sibling module within `mxaccess-asb-nettcp`). No shared state, no cross-import — each can land in its own commit. Per `dependencies.md:88` "Peak agents in parallel: 4 in the framing/encoding wave (A+B+C+D)". -- F25 is sequential after the four framing/encoder streams + F24 land — it composes them. The .NET `MxAsbDataClient` is monolithic enough that splitting F25 across agents costs more in coordination than it saves. -- F26 is sequential after F25. -- **Cross-milestone parallelism still holds.** M5 (this whole F18-F26 cluster) runs in parallel with M3+M4 per `design/60-roadmap.md:14-17` because the `Transport` trait was lifted into M0. M4's `Session` core landed (commits `4863c6d`, `2dc091d`, `a31237d`); the F26 `AsbTransport` plugs into the same trait without re-design. - -**Risk-driven sequencing inside the parallel wave.** R1 in `design/70-risks-and-open-questions.md:9` is the project-blocker. Of the four parallel streams, F23 (auth crypto) carries the most live-probe risk (DH handshake against the live VM is the first irreversible test of the spec port) but is the smallest in LoC. F22 (NBFS) is the largest unknown — the dictionary table size is bounded only by the action subset we exercise. Recommended order *if* agents are constrained: F23 (smallest, highest-leverage) → F20 (foundational for any wire test) → F21 (encoder) → F22 (dictionary) → F24 (codec, independent). - -**Definition of done** for F18 as a whole (= M5 DoD per `design/60-roadmap.md:91`): -1. `cargo run -p mxaccess --example asb-subscribe -- --tag TestChildObject.TestInt` succeeds against a live ASB endpoint. -2. Round-trip parity with `dotnet run --project src\MxAsbClient.Probe` (Frida/Wireshark diff is byte-identical for the proven type matrix). -3. The `mxaccess-asb` type matrix covers what `work_remain.md:108-113` documents as proven: scalar Boolean, Int32, Float, Double, String, DateTime, Duration plus deployed array tags. -4. `cargo build --workspace` and `cargo test --workspace` green; `cargo clippy --workspace -- -D warnings` clean. - -**Resolves when:** F19-F26 are all closed and the four DoD bullets above pass. - -**M5 STATUS (commit `9063f10`): functionally LIVE.** End-to-end `cargo run -p mxaccess --example asb-subscribe -- --tag TestChildObject.TestInt` Connect → AuthenticateMe → Register → Read → Disconnect against the live MxDataProvider, returning the real tag value over the wire (`type_id=4 length=4 payload=[99,0,0,0]`). DoD checklist: -1. ✅ Live `asb-subscribe` succeeds against the AVEVA endpoint. -2. ⚠️ Wire structure matches .NET's request bytes for AuthenticateMe / Register byte-by-byte (verified via `asb-relay` middleman with the .NET probe routed through ClientVia); responses round-trip via the F30 dict-id resolution post-pass. Strict byte-identical parity for the response side is not guaranteed because WCF chunks `Bytes8/16/32` records at different boundaries — both forms are functionally equivalent and `collect_asbidata_payloads` concatenates chunks (commit `cf97eab`). -3. ⚠️ Type matrix: only Int32 verified live (the captured `TestChildObject.TestInt` tag). Bool / Float / Double / String / DateTime / Duration / arrays not yet exercised — pending one or more sample tags per type and an `asb-subscribe` extension that loops over them. F32 captures this expansion. -4. ✅ `cargo build --workspace` + `cargo test --workspace` (711 tests) + `cargo clippy --workspace -- -D warnings` all green. - -**M5 closeout status: all sub-followups resolved.** -- ~~F32~~: resolved via option (b) — three-type live coverage is the deployable maximum; missing types are Galaxy-provisioning-gated. -- ~~F28~~: resolved (commit ``) — all 13 `ConnectedRequest` shapes now sign over byte-identical canonical XML. The 8 remaining ops (Read / Write / Subscription family / etc.) ported in commit ``, fixture-tested byte-equal vs the .NET probe's `--dump-signed-xml` output. Live read still passes after the cutover. Hardens the transport against `hashAlgorithm`-non-empty deployments. -- ~~F29~~: resolved — `nbfs.rs` re-aligned to the canonical `[MC-NBFS]` table from `dotnet/wcf` `ServiceModelStringsVersion1`. -- ~~F26 stream subscription~~: resolved — `AsbSession::subscribe(subscription_id)` returns an `AsbSubscription: Stream>` driven by an internal publish-loop. -- ~~F33~~: resolved — InvalidConnectionId tolerance pattern propagated to all 8 remaining response decoders + the F26 stream's publish-loop terminates cleanly on server-side rejection. - -**Cumulative execution log.** F19 + F23 (`ed17c07`); F24 (`7611d9e`); F20 (`9dfd193`); F22 (`43c10a1`); F21 (`5f98558`); F25 step 1 (`25dbd8d`); F25 step 2 (`a2b8989`); F25 step 3 (`c4bf0a0`); F25 step 4 (`1e59249`); F25 step 5 (`9b8133f`); F25 step 6 (`321b796`); F25 step 7 (`1b1ee1e`); F26 step 1 (`8a0f92b`); F26 step 2 (`14bb529`); example rewrite (`c6570dc`); F25 step 8 (`b543eb1`); F25 step 9 (`0441a2e`); F25 step 10 (`9876b4e`); F26 step 3 (``); **F25 live-bring-up reconciliation** (this commit): -- F25 live-bring-up reconciliation: live `asb-subscribe` + `asb-relay` (TCP middleman) capture-and-diff against AVEVA's MxDataProvider on Windows. Five concrete fixes landed: - 1. **NBFX `PrefixElement_a..z` (0x5E-0x77) and `PrefixAttribute_a..z` (0x26-0x3F) decode + encode arms** — single-letter-prefix records that WCF emits in responses but our codec only recognised the dictionary-named cousins (`PrefixDictionaryElement_a..z` 0x44-0x5D, `PrefixDictionaryAttribute_a..z` 0x0C-0x25). The server's ConnectResponse hit `0x65 = PrefixElement_h` for a dynamically-named element (e.g. ``) and our decoder bailed with `unknown NBFX record byte 0x65`. Both directions now round-trip; the encoder picks the short-form arm whenever `prefix_letter_offset(prefix).is_some()`. - 2. **xmlns redeclaration on `` and `` inside `AuthenticationData` / `PublicKey`** — `[XmlType(Namespace = "http://asb.contracts.data/20111111")]` on the AuthenticationData / PublicKey classes (`AsbContracts.cs:350-381`) means XmlSerializer emits an `xmlns="..."` redeclaration on each direct child. The default-ns scope ends at ``, so `` needs its own redeclaration to stay in the data namespace; without it the server fell back to messages-namespace and the deserialiser threw an `InternalServiceFault`. Connect handshake now completes end-to-end with the apollo:V2 ConnectionLifetime and a real ServicePublicKey. - 3. **SOAP-fault detection on the response path** — `ClientError::SoapFault { action, code, reason }` surfaces when the response Action header matches the canonical `dispatcher/fault` template; we previously let body decoders blindly run and hit `MissingField { field: "Status" }` which masked the fact that the wire was a fault. The reason text is extracted as the longest `NbfxText::Chars` in the body — robust against the `nbfs.rs` static-dictionary id mismatches noted below. - 4. **Identified blocker**: `ConnectedRequest` signing currently HMACs the **NBFX wire bytes** of the unsigned envelope. .NET's `AsbSystemAuthenticator.Sign` (`AsbSystemAuthenticator.cs:79`) HMACs `Encoding.UTF8.GetBytes(request.ToXml())` — the **canonical XML serialisation** of the message contract via `XmlSerializer` with namespace `"urn:invensys.schemas"` (`AsbSerialization.cs:12-48`). Until the Rust port emits identical XML bytes, the HMAC mismatches and the server rejects every signed request (`AuthenticateMe`, `RegisterItems`, etc.) with a generic `dispatcher/fault` InternalServiceFault. Connect itself is unsigned (extends `ServiceMessage`, no `ConnectionValidator` header) which is why it works today. The fault's `a:RelatesTo` UniqueId in our captures matches the AuthenticateMe `MessageID`, confirming the failure point. **New followup F28** captures the XML-canonicaliser scope. - 5. **`nbfs.rs` static dictionary ids drift** at id 114+ vs. the canonical `[MC-NBFS]` table (`Fault`/`Code`/`Reason`/`Text`/`Value` are 20 IDs higher on the wire than what we encode). Doesn't affect requests we send (we only encode IDs ≤44 = `ReplyTo`, all correct), but breaks `decode_envelope`'s element-by-name matching for fault bodies. Tracked as **F29**. - - Workspace: 702 tests pass (no test count delta — wire-only fixes). Live status: Connect handshake working with real DH key + apollo encryption; AuthenticateMe and onwards blocked on F28. Companion diagnostic example `asb-relay.rs` (TCP middleman that hex-dumps both directions to stderr) lands as a permanent debugging aid. - - -- F26 step 3 (`` in the cumulative log): `mxaccess::AsbSession` is a high-level cheap-clone async API on top of `AsbTransport`, deliberately **parallel** to the NMX-shaped `Session` rather than unified. The NMX `Session` carries orchestration (`CallbackExporter`, callback router task, recovery broadcast, `INmxService2` mutex) that has no ASB analogue, and ASB's request/response loop over a single TCP stream maps naturally to `Mutex` — the two paths converge at the consumer-facing `mxaccess` API but stay distinct at the orchestration layer. `AsbSession` is `Clone + Send + Sync` via `Arc`, so each `clone()` is `O(1)` and the inner mutex serialises operation calls. - -For the per-step body of every line listed in the cumulative execution log, see the matching commit message — each commit is a single F-number step with its own scope, fixtures, test count delta, and follow-up notes. The detailed per-step write-ups previously inlined here added little beyond what `git show ` provides. - ### F3 — Cross-domain NTLM Type1/2/3 fixture **Severity:** P2 **Status:** Permanently out-of-scope on the current dev host (no second AD domain). Resolution requires external infrastructure not available here. @@ -85,6 +17,31 @@ For the per-step body of every line listed in the cumulative execution log, see ## Resolved +### F18 — M5 plan of attack (ASB transport, parallel-safe sub-streams) +**Resolved:** 2026-05-06 — all sub-followups F19–F26 closed plus F28 / F29 / F30 / F31 / F32 / F33 / F34 layered on top. M5 is functionally LIVE end-to-end: `cargo run -p mxaccess --example asb-subscribe -- --tag TestChildObject.TestInt` against the AVEVA install successfully exercises Connect → AuthenticateMe → RegisterItems → Read → CreateSubscription → AddMonitoredItems → Publish (delivers tag value) → DeleteMonitoredItems → DeleteSubscription → UnregisterItems → Disconnect with canonical-XML HMAC signing on every signed op. **Severity:** P0 — milestone driver, blocks ASB consumers + V1 release. +**Source:** `design/dependencies.md:73-89` + `design/60-roadmap.md:84-91` + `design/70-risks-and-open-questions.md:5-25`. + +**M5 DoD per `design/60-roadmap.md:91`:** +1. ✅ `cargo run -p mxaccess --example asb-subscribe` succeeds against the live AVEVA endpoint — Read returns the real tag value, Publish stream delivers monitored values via the F26 stream (`AsbVariant { type_id: 4, length: 4, payload: [99, 0, 0, 0] }`). +2. ⚠️ Wire structure matches .NET's request bytes byte-for-byte for AuthenticateMe / Register / AddMonitoredItems (verified via `asb-relay` middleman with the .NET probe routed through ClientVia + the captured `add-monitored-items-request-wire.bin` fixture for F34). Strict byte-identical parity for the response side is not guaranteed because WCF chunks `Bytes8/16/32` records at different boundaries — both forms are functionally equivalent and `collect_asbidata_payloads` concatenates chunks (commit `cf97eab`). Canonical XML for the 13 signed ops is byte-equal to .NET's `XmlSerializer.Serialize` output (F28 fixture-comparison tests). +3. ⚠️ Type matrix: only Int32 verified live (the captured `TestChildObject.TestInt` tag). Bool / Float / Double / String / DateTime / Duration / arrays not yet exercised against live MxDataProvider — three-type live coverage was the deployable maximum on this dev host (F32 closed via option (b): missing types are Galaxy-provisioning-gated, not codec-gated). +4. ✅ `cargo build --workspace` + `cargo test --workspace` (758 tests) + `cargo clippy --workspace -- -D warnings` all green. + +**M5 sub-followup closeout:** +- ~~F19~~: workspace deps for `aes` / `hmac` / `md-5` / `sha1` / `sha2` / `pbkdf2` / `flate2` / `rand` / `crypto-bigint` / `quick-xml` / `tokio-util`. +- ~~F20~~ (NMF framing), ~~F21~~ (NBFX node codec), ~~F22~~ (NBFS static dictionary), ~~F23~~ (auth crypto), ~~F24~~ (`AsbVariant` codec), ~~F25~~ (`IASBIDataV2` client end-to-end), ~~F26~~ (`mxaccess::AsbSession` over `AsbTransport` + `Stream`). +- ~~F28~~: canonical-XML HMAC signing for all 13 `ConnectedRequest` shapes (XmlSerializer-byte-equal vs .NET fixtures; legacy NBFX-bytes fallback retired). +- ~~F29~~: `nbfs.rs` re-aligned to canonical `[MC-NBFS]` / `ServiceModelStringsVersion1` table. +- ~~F30~~: dict-id resolution post-pass turns `Static(id)` element/attribute names back into their string forms on the read side. +- ~~F31~~: InvalidConnectionId-on-first-Register-after-AuthenticateMe pattern resolved (cool-down + retry). +- ~~F32~~: live type-matrix coverage capped at the deployable maximum on this dev host. +- ~~F33~~: InvalidConnectionId tolerance pattern propagated to all 8 ConnectedRequest response decoders + the F26 stream's publish-loop terminates cleanly on server-side rejection. +- ~~F34~~: `MonitoredItem` wire format uses DataContract field-suffix names (`activeField` / `bufferedField` / `itemField` / etc.) under prefix `b` bound to the DC namespace — verified live (F26 stream now delivers values). + +**Cumulative execution log.** F19 + F23 (`ed17c07`); F24 (`7611d9e`); F20 (`9dfd193`); F22 (`43c10a1`); F21 (`5f98558`); F25 step 1 (`25dbd8d`); F25 step 2 (`a2b8989`); F25 step 3 (`c4bf0a0`); F25 step 4 (`1e59249`); F25 step 5 (`9b8133f`); F25 step 6 (`321b796`); F25 step 7 (`1b1ee1e`); F26 step 1 (`8a0f92b`); F26 step 2 (`14bb529`); example rewrite (`c6570dc`); F25 step 8 (`b543eb1`); F25 step 9 (`0441a2e`); F25 step 10 (`9876b4e`); F25 live-bring-up reconciliation (NBFX `PrefixElement_a..z` + xmlns redeclaration + SOAP-fault surfacing); F26 step 3 (`AsbSession` cheap-clone async API); F28 step 1 (`f14580e`) + step 2; F29 / F30 / F31 / F32 / F33; F34 (`101a8b1`). For per-step detail, see the matching commit message — `git show ` is the authoritative record. + +**Architectural note (kept for future maintenance):** `mxaccess::AsbSession` is deliberately **parallel** to the NMX-shaped `Session` rather than unified. The NMX `Session` carries orchestration (`CallbackExporter`, callback router task, recovery broadcast, `INmxService2` mutex) that has no ASB analogue, and ASB's request/response loop over a single TCP stream maps naturally to `Mutex` — the two paths converge at the consumer-facing `mxaccess` API but stay distinct at the orchestration layer. `AsbSession` is `Clone + Send + Sync` via `Arc`, so each `clone()` is `O(1)` and the inner mutex serialises operation calls. + ### F34 — `MonitoredItem` wire format: DataContract field-suffix names, not XmlSerializer property names **Resolved:** 2026-05-06 (commit `101a8b1`). **Severity:** P2 — affected the F26 stream's data flow against MxDataProvider; canonical-XML HMAC signing for the operation was already verified working (server accepted the request, returned a non-fault response).