Adds `write_message::encode_to_bytes_mut` (and the timestamped variant) returning a freshly-allocated `BytesMut`. Allocation count is identical to `encode` (2 allocs/op for fixed-width scalars); the benefit is downstream — consumers can `BytesMut::split_to` / `freeze` and forward the body bytes to a wire-level sink without an intermediate copy. The body builders (`encode_boolean` / `encode_fixed` / `encode_variable` / `encode_array`) were refactored to fill a pre-sized `&mut [u8]` rather than each allocating their own `Vec<u8>`. The dispatcher computes the body size up front via small `*_body_size` helpers and resizes the destination buffer (Vec or BytesMut) once. This is also the prerequisite refactor for F52.3. Bench delta in `design/M6-bench-baseline.md` § F52.1; existing `encode` row unchanged at 2 allocs/op. All 265 round-trip tests unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.5 KiB
M6 — mxaccess-codec allocation-count baseline
Source: cargo bench -p mxaccess-codec (commit recording this file).
Harness: crates/mxaccess-codec/benches/alloc_count.rs — a thin
GlobalAlloc wrapper that increments two atomics on every alloc /
dealloc call, then runs each scenario for 10k iterations after a
1k-iteration warm-up.
Target (per 70-risks-and-open-questions.md R12)
Aim for < 5 allocations per write at steady state.
The bench gates on this: any write_message::encode scenario at
≥ 5 allocs/op causes the binary to exit with code 1.
Baseline (release profile, Windows x64)
| scenario | iters | allocs/op | bytes/op | deallocs/op |
|---|---|---|---|---|
write_message::encode (Int32) |
10,000 | 2.00 | 44 | 2.00 |
write_message::encode (Float32) |
10,000 | 2.00 | 44 | 2.00 |
write_message::encode (Float64) |
10,000 | 2.00 | 52 | 2.00 |
write_message::encode (Boolean) |
10,000 | 1.00 | 37 | 1.00 |
write_message::encode (String, 5 chars) |
10,000 | 4.00 | 92 | 4.00 |
write_message::encode_to_bytes_mut (Int32) |
10,000 | 2.00 | 44 | 2.00 |
MxReferenceHandle::from_names |
10,000 | 2.00 | 22 | 2.00 |
NmxSubscriptionMessage::parse_inner |
10,000 | 1.00 | 72 | 1.00 |
| (DataUpdate, Int32) |
Read
R12's < 5 allocs/write target is already met across the proven matrix:
- Scalar writes (Bool, Int32, Float32, Float64) sit at 1–2 allocs/op.
The two allocs come from (1) the encoder's
Vec<u8>output buffer and (2) an internal scratch buffer in the value-encode path. - String writes hit 4 allocs/op (output buffer, UTF-16LE conversion buffer, the inner-length wrapper, and one more downstream).
MxReferenceHandle::from_namesallocates twice (one percompute_name_signaturecall — UTF-16LE buffer for each name).NmxSubscriptionMessage::parse_innerallocates once for therecords: Vec<NmxSubscriptionRecord>collection.
Implications for F39
F39 (zero-copy pass) was scoped as the work to hit the R12 target. With the target already met, F39's scope tightens to:
- Move the encoder's output buffer to
bytes::BytesMutso consumers can split it without copying. Doesn't reduce alloc count but improves downstream zero-copy on the wire-write path. - Cache the per-handle UTF-16LE name conversion (the two
compute_name_signatureallocs perfrom_names) insideMxReferenceHandleif the same name is registered repeatedly. - Pool the per-frame scratch buffer at the session level so the per-write count drops from 2 → 1 on hot paths.
These are nice-to-have optimisations rather than R12 blockers.
F52 deltas
F52 split the three F39 sub-tasks into their own commits. Each optimisation lands with a before/after row in this section.
F52.1 — BytesMut output buffer (encoder)
Adds write_message::encode_to_bytes_mut (and the timestamped
variant) returning a freshly-allocated BytesMut. Allocation count
is identical to the existing encode path — the benefit is
downstream: consumers can BytesMut::split_to / freeze and forward
the body bytes to a wire-level sink without an intermediate copy.
| scenario | before (allocs/op) | after (allocs/op) |
|---|---|---|
write_message::encode (Int32) |
2.00 | 2.00 |
write_message::encode_to_bytes_mut (Int32) |
— | 2.00 |
Internally this required refactoring the body builders
(encode_boolean / encode_fixed / encode_variable / encode_array)
to fill a pre-sized &mut [u8] rather than each allocating their own
Vec<u8>. The dispatcher computes the body size up front via small
*_body_size helpers and resizes the destination buffer (Vec or
BytesMut) once. This is also the prerequisite refactor for F52.3.
Reproducing
cd rust
cargo bench -p mxaccess-codec
Numbers are deterministic per release-profile build on a given host. Numeric drift across hosts is expected (the warm-up + black_box hints keep iteration counts stable, not the underlying allocator's small-alloc fast-path heuristics).