Files

T

Joseph Doherty 16f2c148e5 design: parallelism map + /loop driver prompt + followups triage

- design/dependencies.md: per-milestone parallelism map for M2–M6 with
  per-phase agent budgets (peak 4 in parallel for M5 framing wave;
  7-agent maximum if M2 wave 1 + M5 framing run concurrently).
- design/prompt.md: self-contained /loop driver. Step 0 triages
  design/followups.md (auto-resolves items whose preconditions are met,
  shelves the rest). Step 3 spawns parallel general-purpose agents per
  design/dependencies.md when the active wave has multiple lanes.
  Sequential lanes (M4 Session core, M5 client integration) run directly.
  Local-commit-only by default; explicit stop conditions; Q7 hasDetailStatus
  audit reminder for any new conditional-read codec port.
- design/README.md: index updated to reference prompt.md, followups.md,
  dependencies.md, and review.md.

design/followups.md is intentionally not pre-created — prompt.md Step 0
bootstraps it on first /loop run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-05 06:34:30 -04:00

9.4 KiB

Raw Blame History

Dependencies and parallelism map

Where the M2–M6 work can be run in parallel, where it can't, and the agent budget per phase. Sits alongside 60-roadmap.md — the roadmap describes what each milestone delivers and its DoD; this file describes the dependency graph inside and across milestones so multiple agents (or developers) can be scheduled without stepping on each other.

Cross-milestone parallelism

Already encoded in the roadmap's "Sequencing dependencies" table. The headline:

M0 ─► M1 ─► M2 ─► M3 ─► M4 ─┐
              │                  ├─► M6 ─► release
              └─────────► M5 ─────┘

M5 (entire ASB path) runs in parallel with M3+M4 (entire NMX path). This is only possible because the cluster-4 sequencing fix moved the Transport trait + Session shape to M0 — they are stable enough at M0 that M5 can build against the trait without waiting for M4 to land the NMX impl. ASB has no transitive dependency on DCE/RPC, NTLM, OBJREF, or OXID.

The other dependency edges are tight: M3 cannot run in parallel with M2 (it needs the live RPC transport to drive register_engine_2); M4 cannot run in parallel with M3 (the async session wraps the raw NMX client built in M3).

Within-milestone parallelism

M2 — DCE/RPC + NTLM + OBJREF + OXID + callback exporter

Wave	Parallelizable streams	Why they're independent
1	(a) NTLMv2 client context · (b) DCE/RPC PDU codec · (c) OBJREF parser	All pure-codec/crypto, no I/O, no shared state. Each maps cleanly to one Rust module under `mxaccess-rpc`.
2	(d) OXID resolution · (e) `IRemUnknown::RemQueryInterface`	Both depend on (b) but not on each other.
3	(f) Callback exporter (the `mxaccess-callback` crate — `INmxSvcCallback` server, `IRemUnknown` server, OBJREF export)	Depends on (a), (b), (e). Single crate, single agent.

Peak agents in parallel: 3 (wave 1). Each agent owns one .cs source family in src/MxNativeClient/ and emits one Rust module.

M3 — NMX session + Galaxy resolver

Stream	Owns	Depends on
A	`mxaccess-galaxy`: SQL resolver (`tag_name`-form input only — `wwtools/grdb/`), user resolver, `dbo.schema_version` startup probe	M0 + M1 (`MxReferenceHandle` for the output type)
B	`mxaccess-nmx`: `NmxClient` with `register_engine_2`, `transfer_data`, `add_subscriber_engine`, `set_heartbeat_send_interval`, `unregister_engine`, `get_partner_version`. Builds `MxReferenceHandle` from resolver output + CRC-16/IBM.	M2 (DCE/RPC + callback exporter)

A and B are fully independent — different crates, different .cs reference sources, different external dependencies. 2 agents in parallel. B can be sub-paralleled per opnum (4 small tasks for the four primary methods) if a third agent is available.

M4 — Async Tokio façade (NMX path)

This is the milestone where parallelism helps least. The Session orchestration layer is genuinely sequential — the recovery state machine, the connection task that owns the TCP stream + callback channel, and the correlation-ID bookkeeping are one cross-cutting design that's hard to chunk across agents without integration pain.

Wave	Parallelizable	Notes
1	(a) `Session` core + long-lived connection task · (b) `RecoveryPolicy` + `RecoveryEvent` types	(b) is small but design-pivotal — agree the event shape before consumers depend on it.
2	(c) write family: `write`, `write_with_completion`, `write_with_timestamp`, `write_secured`, `write_secured_at` · (d) subscribe family: `read`, `subscribe`, `subscribe_many`, `subscribe_buffered`	After (a) lands. (c) and (d) share the connection task but operate on disjoint state.
3	All 7 example programs (`connect-write-read.rs`, `subscribe.rs`, `subscribe-buffered.rs`, `recovery.rs`, `multi-tag.rs`, `secured-write.rs`, `asb-subscribe.rs`)	Pure consumer code, no API impact. Can split to one agent per example.

Peak agents in parallel: 2 in wave 2 (write-family vs subscribe-family). Don't try to split tighter — the connection task has too much shared mutable state (subscription registry, in-flight correlation table, recovery flag).

M5 — ASB transport

The heaviest milestone in raw LoC after the wwtools/mxaccesscli/ verification. The R1 estimate (70-risks-and-open-questions.md) puts it at ~3000 LoC for the framing + encoder layers alone. It splits cleanly along spec boundaries:

Stream	Owns
A	`[MS-NMF]` net.tcp framing — record types (preamble, preamble-ack, sized-envelope, end, fault) + reliable-session ack handling on the underlying TCP channel
B	`[MC-NBFX]` binary-XML node codec — read/write tokenised XML (start-element, end-element, attribute, text, etc.)
C	`[MC-NBFS]` static dictionary table — the SOAP/WS-Addressing/`IASBIDataV2` action strings the encoder references by ID instead of inlining
D	Application auth: DH key exchange (constant-time `crypto-bigint` rather than the .NET `BigInteger.ModPow` defect) + HMAC integrity + AES-128 + DPAPI shared-secret read on Windows
E	`mxaccess-asb` client: `Connect`, `RegisterItems`, `Read`, `Write`, `CreateSubscription`, `AddMonitoredItems`, `Publish`, `Disconnect`. Depends on A+B+C+D.

Peak agents in parallel: 4 in the framing/encoding wave (A+B+C+D), then E is sequential (or 2-way: read/write paths vs subscription paths).

M6 — Compat shim + production hardening

Fully parallel — the four streams are different crates or different feature gates, no inter-stream design coupling.

Stream	Owns
A	`mxaccess-compat`: `LMXProxyServer`-shaped methods layered on top of `Session`. Streams + async fns; the `mxaccess-compat-com` (post-V1) registers `windows-rs`-generated COM classes on top.
B	Performance pass: `bytes::Bytes` zero-copy on receive paths, `BytesMut` pre-allocation per session, codec allocation count benchmarked, hits R12's `< 5 allocations per write at steady state` target.
C	`metrics` feature: counters + histograms via the `metrics` crate. Optional, not on the default-feature path.
D	Docs + release: `cargo doc`, `cargo public-api` baseline, README polish, `cargo publish` per crate in topological order.

Peak agents in parallel: 4. Each owns a different module or feature, no shared mutable state.

Practical agent budget

Phase	Peak parallel agents	Sequential bottleneck
M2	3 (wave 1)	callback exporter (wave 3)
M3	2	live-probe DoD (single AVEVA install)
M4	2	`Session` core + connection task
M5	4 (framing wave)	client (E)
M6	4	none — release sequencing only

If running as agents-in-parallel-per-wave the way M1 ran, peak utilization is 4 agents (M5 framing wave). The honest sequential bottleneck is M4's Session orchestration — that's the one milestone where parallelism doesn't help much because the recovery state machine is one tightly-coupled design.

Wall-clock estimate

Strictly sequential (one developer, one stream): roughly the M2–M6 LoC volume divided by sustained Rust output. Realistic estimate ~12–16 weeks for V1 from M2 start.

Aggressive parallelism (M5 in parallel with M3+M4 + within-milestone agent fan-out): roughly 60% of the sequential wall-clock, ~7–10 weeks. Past that point, coordination overhead and integration debugging eat the gains.

The biggest single win is the M5-parallel-with-M3+M4 lane: ~3–4 weeks saved on its own. Within-milestone parallelism saves a further ~1–2 weeks per milestone but flattens out fast — splitting M4 into 4 streams is not 4× faster than 2 streams.

Constraints that block further parallelism

These are not within-our-control bottlenecks; listing them so they don't get treated as parallelization opportunities:

Live-probe DoDs need a single live AVEVA install. Two agents can't both probe register_engine_2 against the same NmxSvc.exe at the same time — the second one races against the first's RPC channel. Live-probing is sequential per shared resource.
Captured-fixture round-trip tests are CPU-bound but trivially small. Not worth parallelizing the runner.
Cross-cutting design decisions (error taxonomy, recovery semantics, tracing field naming) need to land before consumer code can be written. These are "wave 0" of each milestone — single-agent, fast.
cargo publish ordering is a true topological sort (codec before transport before session); cannot be parallelized.

Recommended sequencing

If picking which lane to push next given the M0+M1 state today:

M2 wave 1 (3 agents) — NTLM, DCE/RPC PDU codec, OBJREF parser. Highest parallelism return, foundational for everything else.
M5 framing wave (4 agents) in parallel with M2 wave 1 — only if you have agent budget. Both ship to mxaccess-asb-nettcp and the M2 work; they don't overlap. This is the maximum-parallelism configuration — 7 agents working concurrently.
M3 stream A (Galaxy resolver) in parallel with M2 wave 3 — Galaxy doesn't need the RPC transport; it can develop while the callback exporter is being built.
M4 wave 1 (Session core + RecoveryPolicy) — sequential after M3 stream B lands.
M6 (4 agents) — once both M4 and M5 land.

Beyond step 5, the work is release-engineering, not feature work.

9.4 KiB Raw Blame History Unescape Escape