- design/dependencies.md: per-milestone parallelism map for M2–M6 with per-phase agent budgets (peak 4 in parallel for M5 framing wave; 7-agent maximum if M2 wave 1 + M5 framing run concurrently). - design/prompt.md: self-contained /loop driver. Step 0 triages design/followups.md (auto-resolves items whose preconditions are met, shelves the rest). Step 3 spawns parallel general-purpose agents per design/dependencies.md when the active wave has multiple lanes. Sequential lanes (M4 Session core, M5 client integration) run directly. Local-commit-only by default; explicit stop conditions; Q7 hasDetailStatus audit reminder for any new conditional-read codec port. - design/README.md: index updated to reference prompt.md, followups.md, dependencies.md, and review.md. design/followups.md is intentionally not pre-created — prompt.md Step 0 bootstraps it on first /loop run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.4 KiB
Dependencies and parallelism map
Where the M2–M6 work can be run in parallel, where it can't, and the agent
budget per phase. Sits alongside 60-roadmap.md — the
roadmap describes what each milestone delivers and its DoD; this file
describes the dependency graph inside and across milestones so multiple
agents (or developers) can be scheduled without stepping on each other.
Cross-milestone parallelism
Already encoded in the roadmap's "Sequencing dependencies" table. The headline:
M0 ─► M1 ─► M2 ─► M3 ─► M4 ─┐
│ ├─► M6 ─► release
└─────────► M5 ─────┘
M5 (entire ASB path) runs in parallel with M3+M4 (entire NMX path). This
is only possible because the cluster-4 sequencing fix moved the Transport
trait + Session shape to M0 — they are stable enough at M0 that M5 can
build against the trait without waiting for M4 to land the NMX impl. ASB has
no transitive dependency on DCE/RPC, NTLM, OBJREF, or OXID.
The other dependency edges are tight: M3 cannot run in parallel with M2 (it
needs the live RPC transport to drive register_engine_2); M4 cannot run in
parallel with M3 (the async session wraps the raw NMX client built in M3).
Within-milestone parallelism
M2 — DCE/RPC + NTLM + OBJREF + OXID + callback exporter
| Wave | Parallelizable streams | Why they're independent |
|---|---|---|
| 1 | (a) NTLMv2 client context · (b) DCE/RPC PDU codec · (c) OBJREF parser | All pure-codec/crypto, no I/O, no shared state. Each maps cleanly to one Rust module under mxaccess-rpc. |
| 2 | (d) OXID resolution · (e) IRemUnknown::RemQueryInterface |
Both depend on (b) but not on each other. |
| 3 | (f) Callback exporter (the mxaccess-callback crate — INmxSvcCallback server, IRemUnknown server, OBJREF export) |
Depends on (a), (b), (e). Single crate, single agent. |
Peak agents in parallel: 3 (wave 1). Each agent owns one .cs source
family in src/MxNativeClient/ and emits one Rust module.
M3 — NMX session + Galaxy resolver
| Stream | Owns | Depends on |
|---|---|---|
| A | mxaccess-galaxy: SQL resolver (tag_name-form input only — wwtools/grdb/), user resolver, dbo.schema_version startup probe |
M0 + M1 (MxReferenceHandle for the output type) |
| B | mxaccess-nmx: NmxClient with register_engine_2, transfer_data, add_subscriber_engine, set_heartbeat_send_interval, unregister_engine, get_partner_version. Builds MxReferenceHandle from resolver output + CRC-16/IBM. |
M2 (DCE/RPC + callback exporter) |
A and B are fully independent — different crates, different .cs reference
sources, different external dependencies. 2 agents in parallel. B can be
sub-paralleled per opnum (4 small tasks for the four primary methods) if a
third agent is available.
M4 — Async Tokio façade (NMX path)
This is the milestone where parallelism helps least. The Session
orchestration layer is genuinely sequential — the recovery state machine,
the connection task that owns the TCP stream + callback channel, and the
correlation-ID bookkeeping are one cross-cutting design that's hard to chunk
across agents without integration pain.
| Wave | Parallelizable | Notes |
|---|---|---|
| 1 | (a) Session core + long-lived connection task · (b) RecoveryPolicy + RecoveryEvent types |
(b) is small but design-pivotal — agree the event shape before consumers depend on it. |
| 2 | (c) write family: write, write_with_completion, write_with_timestamp, write_secured, write_secured_at · (d) subscribe family: read, subscribe, subscribe_many, subscribe_buffered |
After (a) lands. (c) and (d) share the connection task but operate on disjoint state. |
| 3 | All 7 example programs (connect-write-read.rs, subscribe.rs, subscribe-buffered.rs, recovery.rs, multi-tag.rs, secured-write.rs, asb-subscribe.rs) |
Pure consumer code, no API impact. Can split to one agent per example. |
Peak agents in parallel: 2 in wave 2 (write-family vs subscribe-family). Don't try to split tighter — the connection task has too much shared mutable state (subscription registry, in-flight correlation table, recovery flag).
M5 — ASB transport
The heaviest milestone in raw LoC after the wwtools/mxaccesscli/
verification. The R1 estimate (70-risks-and-open-questions.md) puts it at
~3000 LoC for the framing + encoder layers alone. It splits cleanly along
spec boundaries:
| Stream | Owns |
|---|---|
| A | [MS-NMF] net.tcp framing — record types (preamble, preamble-ack, sized-envelope, end, fault) + reliable-session ack handling on the underlying TCP channel |
| B | [MC-NBFX] binary-XML node codec — read/write tokenised XML (start-element, end-element, attribute, text, etc.) |
| C | [MC-NBFS] static dictionary table — the SOAP/WS-Addressing/IASBIDataV2 action strings the encoder references by ID instead of inlining |
| D | Application auth: DH key exchange (constant-time crypto-bigint rather than the .NET BigInteger.ModPow defect) + HMAC integrity + AES-128 + DPAPI shared-secret read on Windows |
| E | mxaccess-asb client: Connect, RegisterItems, Read, Write, CreateSubscription, AddMonitoredItems, Publish, Disconnect. Depends on A+B+C+D. |
Peak agents in parallel: 4 in the framing/encoding wave (A+B+C+D), then E is sequential (or 2-way: read/write paths vs subscription paths).
M6 — Compat shim + production hardening
Fully parallel — the four streams are different crates or different feature gates, no inter-stream design coupling.
| Stream | Owns |
|---|---|
| A | mxaccess-compat: LMXProxyServer-shaped methods layered on top of Session. Streams + async fns; the mxaccess-compat-com (post-V1) registers windows-rs-generated COM classes on top. |
| B | Performance pass: bytes::Bytes zero-copy on receive paths, BytesMut pre-allocation per session, codec allocation count benchmarked, hits R12's < 5 allocations per write at steady state target. |
| C | metrics feature: counters + histograms via the metrics crate. Optional, not on the default-feature path. |
| D | Docs + release: cargo doc, cargo public-api baseline, README polish, cargo publish per crate in topological order. |
Peak agents in parallel: 4. Each owns a different module or feature, no shared mutable state.
Practical agent budget
| Phase | Peak parallel agents | Sequential bottleneck |
|---|---|---|
| M2 | 3 (wave 1) | callback exporter (wave 3) |
| M3 | 2 | live-probe DoD (single AVEVA install) |
| M4 | 2 | Session core + connection task |
| M5 | 4 (framing wave) | client (E) |
| M6 | 4 | none — release sequencing only |
If running as agents-in-parallel-per-wave the way M1 ran, peak utilization
is 4 agents (M5 framing wave). The honest sequential bottleneck is M4's
Session orchestration — that's the one milestone where parallelism doesn't
help much because the recovery state machine is one tightly-coupled design.
Wall-clock estimate
Strictly sequential (one developer, one stream): roughly the M2–M6 LoC volume divided by sustained Rust output. Realistic estimate ~12–16 weeks for V1 from M2 start.
Aggressive parallelism (M5 in parallel with M3+M4 + within-milestone agent fan-out): roughly 60% of the sequential wall-clock, ~7–10 weeks. Past that point, coordination overhead and integration debugging eat the gains.
The biggest single win is the M5-parallel-with-M3+M4 lane: ~3–4 weeks saved on its own. Within-milestone parallelism saves a further ~1–2 weeks per milestone but flattens out fast — splitting M4 into 4 streams is not 4× faster than 2 streams.
Constraints that block further parallelism
These are not within-our-control bottlenecks; listing them so they don't get treated as parallelization opportunities:
- Live-probe DoDs need a single live AVEVA install. Two agents can't
both probe
register_engine_2against the sameNmxSvc.exeat the same time — the second one races against the first's RPC channel. Live-probing is sequential per shared resource. - Captured-fixture round-trip tests are CPU-bound but trivially small. Not worth parallelizing the runner.
- Cross-cutting design decisions (error taxonomy, recovery semantics,
tracingfield naming) need to land before consumer code can be written. These are "wave 0" of each milestone — single-agent, fast. cargo publishordering is a true topological sort (codec before transport before session); cannot be parallelized.
Recommended sequencing
If picking which lane to push next given the M0+M1 state today:
- M2 wave 1 (3 agents) — NTLM, DCE/RPC PDU codec, OBJREF parser. Highest parallelism return, foundational for everything else.
- M5 framing wave (4 agents) in parallel with M2 wave 1 — only if you
have agent budget. Both ship to
mxaccess-asb-nettcpand the M2 work; they don't overlap. This is the maximum-parallelism configuration — 7 agents working concurrently. - M3 stream A (Galaxy resolver) in parallel with M2 wave 3 — Galaxy doesn't need the RPC transport; it can develop while the callback exporter is being built.
- M4 wave 1 (Session core + RecoveryPolicy) — sequential after M3 stream B lands.
- M6 (4 agents) — once both M4 and M5 land.
Beyond step 5, the work is release-engineering, not feature work.