From 16f2c148e53f14dbd8449a505d85b631cf872390 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Tue, 5 May 2026 06:34:30 -0400 Subject: [PATCH] design: parallelism map + /loop driver prompt + followups triage MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - design/dependencies.md: per-milestone parallelism map for M2–M6 with per-phase agent budgets (peak 4 in parallel for M5 framing wave; 7-agent maximum if M2 wave 1 + M5 framing run concurrently). - design/prompt.md: self-contained /loop driver. Step 0 triages design/followups.md (auto-resolves items whose preconditions are met, shelves the rest). Step 3 spawns parallel general-purpose agents per design/dependencies.md when the active wave has multiple lanes. Sequential lanes (M4 Session core, M5 client integration) run directly. Local-commit-only by default; explicit stop conditions; Q7 hasDetailStatus audit reminder for any new conditional-read codec port. - design/README.md: index updated to reference prompt.md, followups.md, dependencies.md, and review.md. design/followups.md is intentionally not pre-created — prompt.md Step 0 bootstraps it on first /loop run. Co-Authored-By: Claude Opus 4.7 (1M context) --- design/README.md | 6 + design/dependencies.md | 169 +++++++++++++++++++++++ design/prompt.md | 294 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 469 insertions(+) create mode 100644 design/dependencies.md create mode 100644 design/prompt.md diff --git a/design/README.md b/design/README.md index 7db34e7..fed4119 100644 --- a/design/README.md +++ b/design/README.md @@ -14,6 +14,10 @@ The folder is structured as a small set of focused documents. Read in order; eac | `50-error-model.md` | `MxStatus`, error types, panic/cancellation policy | | `60-roadmap.md` | Milestones M0..M6, validation strategy | | `70-risks-and-open-questions.md` | Parity gaps, unproven flows, cross-platform constraints | +| `dependencies.md` | Cross- and within-milestone parallelism map; agent budget per phase | +| `review.md` | Adversarial review log (BLOCKER/MAJOR/MINOR/NIT findings, all resolved) | +| `prompt.md` | `/loop` driver prompt for autonomous M2–M6 execution | +| `followups.md` | Open / resolved deferred work items; auto-triaged by `prompt.md` Step 0 (created on first /loop run if missing) | The design is grounded in the .NET reference at `src/` and the protocol artifacts in `docs/`, `analysis/`, and `captures/`. **Do not introduce protocol behavior in these documents that is not already proven in the reference.** When adding a new claim about wire format, cite either: @@ -29,3 +33,5 @@ This folder is documentation, not code. When the Rust workspace is created, the - Protocol question: 40 first, then the relevant section of 10. - API question: 20 first, then 50. - Planning a milestone: 60 first, cross-reference 70 for blockers. +- Scheduling concurrent work: `dependencies.md` for the per-phase parallelism map. +- Driving M2–M6 autonomously via `/loop`: `prompt.md` (and the `followups.md` triage log it maintains). diff --git a/design/dependencies.md b/design/dependencies.md new file mode 100644 index 0000000..6fb3efc --- /dev/null +++ b/design/dependencies.md @@ -0,0 +1,169 @@ +# Dependencies and parallelism map + +Where the M2–M6 work can be run in parallel, where it can't, and the agent +budget per phase. Sits alongside [`60-roadmap.md`](60-roadmap.md) — the +roadmap describes what each milestone delivers and its DoD; this file +describes the dependency graph **inside and across** milestones so multiple +agents (or developers) can be scheduled without stepping on each other. + +## Cross-milestone parallelism + +Already encoded in the roadmap's "Sequencing dependencies" table. The headline: + +``` +M0 ─► M1 ─► M2 ─► M3 ─► M4 ─┐ + │ ├─► M6 ─► release + └─────────► M5 ─────┘ +``` + +**M5 (entire ASB path) runs in parallel with M3+M4 (entire NMX path).** This +is only possible because the cluster-4 sequencing fix moved the `Transport` +trait + `Session` shape to M0 — they are stable enough at M0 that M5 can +build against the trait without waiting for M4 to land the NMX impl. ASB has +no transitive dependency on DCE/RPC, NTLM, OBJREF, or OXID. + +The other dependency edges are tight: M3 cannot run in parallel with M2 (it +needs the live RPC transport to drive `register_engine_2`); M4 cannot run in +parallel with M3 (the async session wraps the raw NMX client built in M3). + +## Within-milestone parallelism + +### M2 — DCE/RPC + NTLM + OBJREF + OXID + callback exporter + +| Wave | Parallelizable streams | Why they're independent | +|---|---|---| +| **1** | **(a) NTLMv2 client context · (b) DCE/RPC PDU codec · (c) OBJREF parser** | All pure-codec/crypto, no I/O, no shared state. Each maps cleanly to one Rust module under `mxaccess-rpc`. | +| 2 | (d) OXID resolution · (e) `IRemUnknown::RemQueryInterface` | Both depend on (b) but not on each other. | +| 3 | (f) Callback exporter (the `mxaccess-callback` crate — `INmxSvcCallback` server, `IRemUnknown` server, OBJREF export) | Depends on (a), (b), (e). Single crate, single agent. | + +**Peak agents in parallel: 3** (wave 1). Each agent owns one `.cs` source +family in `src/MxNativeClient/` and emits one Rust module. + +### M3 — NMX session + Galaxy resolver + +| Stream | Owns | Depends on | +|---|---|---| +| **A** | `mxaccess-galaxy`: SQL resolver (`tag_name`-form input only — `wwtools/grdb/`), user resolver, `dbo.schema_version` startup probe | M0 + M1 (`MxReferenceHandle` for the output type) | +| **B** | `mxaccess-nmx`: `NmxClient` with `register_engine_2`, `transfer_data`, `add_subscriber_engine`, `set_heartbeat_send_interval`, `unregister_engine`, `get_partner_version`. Builds `MxReferenceHandle` from resolver output + CRC-16/IBM. | M2 (DCE/RPC + callback exporter) | + +A and B are fully independent — different crates, different `.cs` reference +sources, different external dependencies. **2 agents in parallel.** B can be +sub-paralleled per opnum (4 small tasks for the four primary methods) if a +third agent is available. + +### M4 — Async Tokio façade (NMX path) + +This is the milestone where parallelism helps least. The `Session` +orchestration layer is genuinely sequential — the recovery state machine, +the connection task that owns the TCP stream + callback channel, and the +correlation-ID bookkeeping are one cross-cutting design that's hard to chunk +across agents without integration pain. + +| Wave | Parallelizable | Notes | +|---|---|---| +| 1 | (a) `Session` core + long-lived connection task · (b) `RecoveryPolicy` + `RecoveryEvent` types | (b) is small but design-pivotal — agree the event shape before consumers depend on it. | +| 2 | (c) write family: `write`, `write_with_completion`, `write_with_timestamp`, `write_secured`, `write_secured_at` · (d) subscribe family: `read`, `subscribe`, `subscribe_many`, `subscribe_buffered` | After (a) lands. (c) and (d) share the connection task but operate on disjoint state. | +| 3 | All 7 example programs (`connect-write-read.rs`, `subscribe.rs`, `subscribe-buffered.rs`, `recovery.rs`, `multi-tag.rs`, `secured-write.rs`, `asb-subscribe.rs`) | Pure consumer code, no API impact. Can split to one agent per example. | + +**Peak agents in parallel: 2** in wave 2 (write-family vs subscribe-family). +Don't try to split tighter — the connection task has too much shared mutable +state (subscription registry, in-flight correlation table, recovery flag). + +### M5 — ASB transport + +The heaviest milestone in raw LoC after the `wwtools/mxaccesscli/` +verification. The R1 estimate (`70-risks-and-open-questions.md`) puts it at +~3000 LoC for the framing + encoder layers alone. It splits cleanly along +spec boundaries: + +| Stream | Owns | +|---|---| +| **A** | `[MS-NMF]` net.tcp framing — record types (preamble, preamble-ack, sized-envelope, end, fault) + reliable-session ack handling on the underlying TCP channel | +| **B** | `[MC-NBFX]` binary-XML node codec — read/write tokenised XML (start-element, end-element, attribute, text, etc.) | +| **C** | `[MC-NBFS]` static dictionary table — the SOAP/WS-Addressing/`IASBIDataV2` action strings the encoder references by ID instead of inlining | +| **D** | Application auth: DH key exchange (constant-time `crypto-bigint` rather than the .NET `BigInteger.ModPow` defect) + HMAC integrity + AES-128 + DPAPI shared-secret read on Windows | +| **E** | `mxaccess-asb` client: `Connect`, `RegisterItems`, `Read`, `Write`, `CreateSubscription`, `AddMonitoredItems`, `Publish`, `Disconnect`. Depends on A+B+C+D. | + +**Peak agents in parallel: 4** in the framing/encoding wave (A+B+C+D), then +E is sequential (or 2-way: read/write paths vs subscription paths). + +### M6 — Compat shim + production hardening + +Fully parallel — the four streams are different crates or different +feature gates, no inter-stream design coupling. + +| Stream | Owns | +|---|---| +| **A** | `mxaccess-compat`: `LMXProxyServer`-shaped methods layered on top of `Session`. Streams + async fns; the `mxaccess-compat-com` (post-V1) registers `windows-rs`-generated COM classes on top. | +| **B** | Performance pass: `bytes::Bytes` zero-copy on receive paths, `BytesMut` pre-allocation per session, codec allocation count benchmarked, hits R12's `< 5 allocations per write at steady state` target. | +| **C** | `metrics` feature: counters + histograms via the `metrics` crate. Optional, not on the default-feature path. | +| **D** | Docs + release: `cargo doc`, `cargo public-api` baseline, README polish, `cargo publish` per crate in topological order. | + +**Peak agents in parallel: 4.** Each owns a different module or feature, no +shared mutable state. + +## Practical agent budget + +| Phase | Peak parallel agents | Sequential bottleneck | +|---|---|---| +| M2 | 3 (wave 1) | callback exporter (wave 3) | +| M3 | 2 | live-probe DoD (single AVEVA install) | +| M4 | 2 | `Session` core + connection task | +| M5 | 4 (framing wave) | client (E) | +| M6 | 4 | none — release sequencing only | + +If running as agents-in-parallel-per-wave the way M1 ran, peak utilization +is **4 agents** (M5 framing wave). The honest sequential bottleneck is M4's +`Session` orchestration — that's the one milestone where parallelism doesn't +help much because the recovery state machine is one tightly-coupled design. + +## Wall-clock estimate + +Strictly sequential (one developer, one stream): roughly the M2–M6 LoC +volume divided by sustained Rust output. Realistic estimate ~12–16 weeks +for V1 from M2 start. + +Aggressive parallelism (M5 in parallel with M3+M4 + within-milestone agent +fan-out): roughly **60% of the sequential wall-clock**, ~7–10 weeks. Past +that point, coordination overhead and integration debugging eat the gains. + +The biggest single win is the M5-parallel-with-M3+M4 lane: ~3–4 weeks saved +on its own. Within-milestone parallelism saves a further ~1–2 weeks per +milestone but flattens out fast — splitting M4 into 4 streams is not 4× +faster than 2 streams. + +## Constraints that block further parallelism + +These are **not** within-our-control bottlenecks; listing them so they don't +get treated as parallelization opportunities: + +1. **Live-probe DoDs need a single live AVEVA install.** Two agents can't + both probe `register_engine_2` against the same `NmxSvc.exe` at the same + time — the second one races against the first's RPC channel. + Live-probing is sequential per shared resource. +2. **Captured-fixture round-trip tests are CPU-bound but trivially small.** + Not worth parallelizing the runner. +3. **Cross-cutting design decisions** (error taxonomy, recovery semantics, + `tracing` field naming) need to land before consumer code can be + written. These are "wave 0" of each milestone — single-agent, fast. +4. **`cargo publish` ordering** is a true topological sort (codec before + transport before session); cannot be parallelized. + +## Recommended sequencing + +If picking which lane to push next given the M0+M1 state today: + +1. **M2 wave 1 (3 agents)** — NTLM, DCE/RPC PDU codec, OBJREF parser. Highest + parallelism return, foundational for everything else. +2. **M5 framing wave (4 agents) in parallel with M2 wave 1** — only if you + have agent budget. Both ship to `mxaccess-asb-nettcp` and the M2 + work; they don't overlap. **This is the maximum-parallelism configuration — + 7 agents working concurrently.** +3. **M3 stream A (Galaxy resolver) in parallel with M2 wave 3** — Galaxy + doesn't need the RPC transport; it can develop while the callback + exporter is being built. +4. **M4 wave 1 (Session core + RecoveryPolicy)** — sequential after M3 + stream B lands. +5. **M6 (4 agents)** — once both M4 and M5 land. + +Beyond step 5, the work is release-engineering, not feature work. diff --git a/design/prompt.md b/design/prompt.md new file mode 100644 index 0000000..451fa88 --- /dev/null +++ b/design/prompt.md @@ -0,0 +1,294 @@ +# `/loop` driver — autonomous M2–M6 implementation + +You are inside a `/loop` iteration for the **mxaccess Rust port** at +`c:\Users\dohertj2\Desktop\mxaccess`. Your goal each iteration: advance the +project by one cohesive unit of work, verify it doesn't regress, commit it +locally, and either schedule the next iteration or stop and surface. + +This file is re-fed to you on every iteration with no carry-over state. Read +it top-to-bottom each time. Discover everything else from the project itself. + +--- + +## Iteration protocol + +### Step 0 — Triage `design/followups.md` + +Read `design/followups.md`. If it does not exist, create it with this +skeleton: + +```markdown +# Followups + +Open work items deferred during /loop iterations. Triaged at the top of +every iteration. New items are appended under `## Open`; resolved items +move to `## Resolved` with a date + commit hash. + +## Open + +(none yet) + +## Resolved + +(none yet) +``` + +Then for each item under `## Open`: + +1. Read the **`Resolves when:`** clause. +2. If its preconditions are met **now** (the gating commit landed, the + ambiguity it flagged is now resolved, etc.) → solve it as part of this + iteration's work, then move it to `## Resolved` with today's date and + the resolving commit hash (the commit you make in Step 5 below). +3. If preconditions are **not** met → leave it under `## Open` and move on. + +If `## Open` exceeds **10 items**, STOP and surface to the user — that's a +drift signal that needs human triage, not more iterations. + +### Step 1 — Discover state + +Run in parallel (single assistant message, multiple tool calls): + +- `git log --oneline -20` +- `cd rust && cargo test --workspace 2>&1 | grep "test result:" | grep -v "0 passed" | tail -5` — must show all-pass results. +- Read `design/60-roadmap.md`, `design/dependencies.md`, `design/review.md`. + +If `cargo test` is **not green** at the start of the iteration: + +1. Diagnose with `cargo build` and the failing test name. +2. Apply **one** targeted fix. +3. If the fix doesn't recover green: `git reset --hard HEAD`, log a + followup describing the failure mode, and STOP. + +Never proceed past Step 1 with a red baseline. + +### Step 2 — Identify the current phase and unblocked lanes + +From the git log + recent commits, determine which milestone is in flight: + +- Most recent `[M0]` / `[M1]` / `[M2]` / ... commits → that's the active phase. +- If the active phase's DoD (per `design/60-roadmap.md`) is fully met, the + next iteration's work is the **next** milestone. Advance the phase + marker mentally. + +Then consult `design/dependencies.md` for the active phase's parallelism map. +Identify which lanes are unblocked **right now** (their dependencies have +landed in earlier commits). + +The summary table for fast lookup: + +``` +M2 wave 1: 3 agents — NTLMv2 client / DCE/RPC PDU codec / OBJREF parser +M2 wave 2: 2 agents — OXID resolution / IRemUnknown::RemQueryInterface +M2 wave 3: 1 agent — mxaccess-callback (the INmxSvcCallback exporter) +M3: 2 agents — mxaccess-galaxy / mxaccess-nmx +M4 wave 1: 1 agent — Session core + RecoveryPolicy types (sequential) +M4 wave 2: 2 agents — write family / subscribe family +M4 wave 3: 7 agents — examples (one each) +M5: 4 agents — MS-NMF framing / MC-NBFX codec / MC-NBFS dictionary / DH+HMAC+AES +M5 client: 1 agent — mxaccess-asb operations (sequential after framing) +M6: 4 agents — mxaccess-compat / perf / metrics / docs +``` + +Pick the smallest unit that: +- Is currently unblocked (dependencies landed). +- Is not already covered by an open followup that's deferred. +- Can be completed in one iteration's work (one commit's worth). + +### Step 3 — Execute + +**If the active wave has multiple parallel streams** (any row above with +`N agents` where `N > 1`), spawn that many `general-purpose` agents in **one +single message** containing N parallel `Agent` tool calls. Each agent owns +one `.cs` source file (or one logical unit) and emits one Rust module. + +Each agent's prompt MUST include: + +- Project context (one paragraph: this is the mxaccess Rust port, + `src/` is the executable spec, CLAUDE.md forbids fabrication). +- The exact `.cs` source path to port. +- The exact Rust output module path. +- Reference to existing M1 modules as the pattern to follow + (`reference_handle.rs`, `envelope.rs`, `status.rs`). +- Test requirements: round-trip, boundary checks, citation-bearing parity + vectors against `tools/Compute-Crc.ps1` style helpers where applicable. +- Hard rule: do NOT edit `lib.rs`. The driver wires up modules after agents finish. +- Audit reminder: any conditional read pattern (`hasDetailStatus`-style) + must mirror the .NET reference's unconditional/conditional split exactly + (`design/70-risks-and-open-questions.md` Q7 — the M1 wave-1 audit defect). + +**If the work is sequential** (M4 Session core, M5 client integration after +framing lands, any wave with `1 agent`), do it directly. Read the .cs source, +port it inline, write tests inline. Do not spawn an agent for sequential +single-stream work. + +After parallel agents return: wire up `lib.rs` (mod declarations + re-exports) +and remove any stub types they replaced. + +### Step 4 — Verify + +Run all four DoD gates: + +- `cargo build --workspace --all-targets` +- `cargo test --workspace --no-fail-fast` +- `cargo clippy --workspace --all-targets -- -D warnings` +- `cargo fmt --all -- --check` + +For codec changes also verify the .NET parity test still passes: +- `cargo test -p mxaccess-codec --test dotnet_codec_parity` + +If any gate fails: + +1. Try **one** targeted fix. +2. If still failing → `git reset --hard HEAD`. Append a followup. STOP. + +### Step 5 — Commit (local only) + +- `git add -A`. +- Commit message format: + +``` +[M] : + + +- Test count delta: NNN → MMM (+K) +- Open followups touched: F, F (or "none") + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +- **Do not push.** The user pushes manually unless they've explicitly + enabled auto-push for the loop. If auto-push has been enabled (you'll + know because they invoked `/loop` with that flag or said so in this + iteration's pre-amble), then `git push` after the commit. + +### Step 6 — Log new followups + +For each item this iteration **discovered but did not solve**, append to +`design/followups.md` under `## Open` using this schema: + +```markdown +### F +**Severity:** P0 | P1 | P2 | P3 +**Source:** commit +**Why deferred:** +**Resolves when:** +**Notes:** +``` + +`` is the next free integer (highest existing F-number + 1, or 1 on +first followup). + +Common follow-up sources: +- An agent reported a deviation from the .NET reference that requires a + separate design-decision turn. +- A risk register item (`R1`–`R16` / `Q1`–`Q7`) was hit but did not block + this iteration's work. +- A test fixture is missing and would require a new live capture. +- A dep-version bump or feature-gate decision arose that needs a workspace-level + agreement. + +### Step 7 — Decide next + +| Condition | Action | +|---|---| +| Milestone DoD fully satisfied (per `60-roadmap.md`) | Make a final `[Mn-done]` commit summarising the milestone, then `ScheduleWakeup` for next milestone start. | +| Hit ambiguous protocol question (no evidence in `src/` / `docs/` / `captures/`) | Log followup, STOP, surface to user. | +| Hit P0/P1 blocker tracked in `70-risks-and-open-questions.md` | Log followup, STOP, surface. | +| 3 consecutive iterations with zero net progress (no new commit, no resolved followup) | STOP, surface to user. | +| `## Open` followups list > 10 items | STOP, ask user to triage. | +| M6 DoD fully satisfied | Project complete. STOP. Do not schedule. | +| Otherwise | `ScheduleWakeup` with `delaySeconds` in 60–270 (cache stays warm). | + +When you call `ScheduleWakeup`, pass the literal sentinel +`<>` as `prompt` so the runtime re-resolves these +instructions. Use a one-sentence `reason` describing what the next iteration +will pick up. + +--- + +## Hard rules (do not negotiate) + +- **No fabricated protocol behaviour.** Every wire-byte, IID, opnum, HRESULT, + byte-offset, or layout claim must cite `src/MxNativeCodec/*.cs:LINE`, + `src/MxNativeClient/*.cs:LINE`, `src/MxAsbClient/*.cs:LINE`, `docs/*.md:LINE`, + `analysis/frida/*.tsv`, or `captures/0NN-frida-*`. If you can't cite, you + can't claim. Log a followup instead. +- **No `--force`, `--no-verify`, `--no-gpg-sign`** on git commands. +- **No amending pushed commits.** Always create new commits. +- **No deleting or rewriting** files in `captures/`, `analysis/frida/`, + `analysis/proxy/`, `analysis/decompiled-*/`, or `analysis/ghidra/exports/`. + These are evidence per CLAUDE.md. +- **No editing `lib.rs` from an agent.** The driver wires up modules. +- **No skipping verification.** All four cargo gates green or revert. +- **No pushing without authorization.** Commit locally only by default. +- **Preserve unknown bytes.** Match the .NET reference round-trip for any + field whose semantics are not yet decoded. Use `[u8; N]` preservation + fields and document with a `:LINE` citation. +- **Tests over assertions.** Do not add `assert!(true)` or `assert!()` + at runtime; use `const _: () = assert!(...)` for compile-time checks. +- **Conditional reads must match the .NET reference exactly** — see + `design/70-risks-and-open-questions.md` Q7 (`hasDetailStatus` audit). Any + field the .NET reads unconditionally must be read unconditionally in Rust. + +--- + +## Self-check before scheduling next iteration + +Before calling `ScheduleWakeup`, verify each: + +- [ ] `cargo build --workspace --all-targets` exited 0. +- [ ] `cargo test --workspace` exited 0 with all-pass results. +- [ ] `cargo clippy --workspace --all-targets -- -D warnings` exited 0. +- [ ] `cargo fmt --all -- --check` exited 0. +- [ ] One commit landed this iteration (verify with `git log -1 --oneline`). +- [ ] If any followup was deferred this iteration, `design/followups.md` + has the new entry. +- [ ] `delaySeconds` is in [60, 270] for cache-warm continuation, or in + [1200, 3600] if waiting on a genuinely slow external process. + +If any item fails, do **not** schedule next iteration. Surface to the user. + +--- + +## Useful commands reference + +```powershell +# State discovery +cd c:\Users\dohertj2\Desktop\mxaccess +git log --oneline -20 +git status --short + +# Rust gates (run from rust/) +cd rust +cargo build --workspace --all-targets +cargo test --workspace --no-fail-fast +cargo clippy --workspace --all-targets -- -D warnings +cargo fmt --all -- --check +cargo test -p mxaccess-codec --test dotnet_codec_parity + +# Live-probe gating (M3+ only) +. ..\tools\Setup-LiveProbeEnv.ps1 +cargo test -p mxaccess --features live -- --ignored + +# .NET parity helpers +dotnet build src\MxNativeCodec\MxNativeCodec.csproj +pwsh -NoProfile -File tools\Compute-Crc.ps1 +``` + +--- + +## Why this design + +- **Step 0 first** keeps `followups.md` from rotting. Items that became + solvable get cleaned up immediately. +- **Step 1's red-baseline check** prevents iterations from compounding bugs. +- **Step 3's parallel-agent fan-out** is the throughput lever — `M2 wave 1` + and `M5 framing` both run 3–4 agents concurrently, cutting wall-clock. +- **Step 5's local-commit-only** default is reversibility. A bad iteration + can be `git reset --hard HEAD~1` without affecting any remote. +- **Step 7's stop conditions** are explicit and disjoint. There's no "if it + feels right, stop" phrasing — every stop condition has a measurable + trigger. +- **Hard rules** are lifted from `CLAUDE.md` so the loop cannot drift even + if a single iteration loses context.