Files
mxaccess/design/M6-bench-baseline.md
T
Joseph Doherty 4e76b44391 [F52.1] mxaccess-codec: BytesMut output buffer for write encoder
Adds `write_message::encode_to_bytes_mut` (and the timestamped variant)
returning a freshly-allocated `BytesMut`. Allocation count is identical
to `encode` (2 allocs/op for fixed-width scalars); the benefit is
downstream — consumers can `BytesMut::split_to` / `freeze` and forward
the body bytes to a wire-level sink without an intermediate copy.

The body builders (`encode_boolean` / `encode_fixed` / `encode_variable`
/ `encode_array`) were refactored to fill a pre-sized `&mut [u8]`
rather than each allocating their own `Vec<u8>`. The dispatcher
computes the body size up front via small `*_body_size` helpers and
resizes the destination buffer (Vec or BytesMut) once. This is also
the prerequisite refactor for F52.3.

Bench delta in `design/M6-bench-baseline.md` § F52.1; existing
`encode` row unchanged at 2 allocs/op. All 265 round-trip tests
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:46:02 -04:00

96 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# M6 — `mxaccess-codec` allocation-count baseline
Source: `cargo bench -p mxaccess-codec` (commit recording this file).
Harness: `crates/mxaccess-codec/benches/alloc_count.rs` — a thin
`GlobalAlloc` wrapper that increments two atomics on every `alloc` /
`dealloc` call, then runs each scenario for 10k iterations after a
1k-iteration warm-up.
## Target (per `70-risks-and-open-questions.md` R12)
> Aim for < 5 allocations per write at steady state.
The bench gates on this: any `write_message::encode` scenario at
≥ 5 allocs/op causes the binary to exit with code 1.
## Baseline (release profile, Windows x64)
| scenario | iters | allocs/op | bytes/op | deallocs/op |
|------------------------------------------------|--------:|----------:|---------:|------------:|
| `write_message::encode` (Int32) | 10,000 | 2.00 | 44 | 2.00 |
| `write_message::encode` (Float32) | 10,000 | 2.00 | 44 | 2.00 |
| `write_message::encode` (Float64) | 10,000 | 2.00 | 52 | 2.00 |
| `write_message::encode` (Boolean) | 10,000 | 1.00 | 37 | 1.00 |
| `write_message::encode` (String, 5 chars) | 10,000 | 4.00 | 92 | 4.00 |
| `write_message::encode_to_bytes_mut` (Int32) | 10,000 | 2.00 | 44 | 2.00 |
| `MxReferenceHandle::from_names` | 10,000 | 2.00 | 22 | 2.00 |
| `NmxSubscriptionMessage::parse_inner` | 10,000 | 1.00 | 72 | 1.00 |
| (DataUpdate, Int32) | | | | |
## Read
R12's < 5 allocs/write target is **already met** across the proven matrix:
- Scalar writes (Bool, Int32, Float32, Float64) sit at 12 allocs/op.
The two allocs come from (1) the encoder's `Vec<u8>` output buffer
and (2) an internal scratch buffer in the value-encode path.
- String writes hit 4 allocs/op (output buffer, UTF-16LE conversion
buffer, the inner-length wrapper, and one more downstream).
- `MxReferenceHandle::from_names` allocates twice (one per
`compute_name_signature` call — UTF-16LE buffer for each name).
- `NmxSubscriptionMessage::parse_inner` allocates once for the
`records: Vec<NmxSubscriptionRecord>` collection.
## Implications for F39
F39 (zero-copy pass) was scoped as the work to *hit* the R12 target.
With the target already met, F39's scope tightens to:
- Move the encoder's output buffer to `bytes::BytesMut` so consumers
can split it without copying. Doesn't reduce alloc count but
improves downstream zero-copy on the wire-write path.
- Cache the per-handle UTF-16LE name conversion (the two
`compute_name_signature` allocs per `from_names`) inside
`MxReferenceHandle` if the same name is registered repeatedly.
- Pool the per-frame scratch buffer at the session level so the
per-write count drops from 2 → 1 on hot paths.
These are nice-to-have optimisations rather than R12 blockers.
## F52 deltas
F52 split the three F39 sub-tasks into their own commits. Each
optimisation lands with a before/after row in this section.
### F52.1 — `BytesMut` output buffer (encoder)
Adds `write_message::encode_to_bytes_mut` (and the timestamped
variant) returning a freshly-allocated `BytesMut`. Allocation count
is **identical** to the existing `encode` path — the benefit is
downstream: consumers can `BytesMut::split_to` / `freeze` and forward
the body bytes to a wire-level sink without an intermediate copy.
| scenario | before (allocs/op) | after (allocs/op) |
|----------------------------------------------|-------------------:|------------------:|
| `write_message::encode` (Int32) | 2.00 | 2.00 |
| `write_message::encode_to_bytes_mut` (Int32) | — | 2.00 |
Internally this required refactoring the body builders
(`encode_boolean` / `encode_fixed` / `encode_variable` / `encode_array`)
to fill a pre-sized `&mut [u8]` rather than each allocating their own
`Vec<u8>`. The dispatcher computes the body size up front via small
`*_body_size` helpers and resizes the destination buffer (Vec or
BytesMut) once. This is also the prerequisite refactor for F52.3.
## Reproducing
```powershell
cd rust
cargo bench -p mxaccess-codec
```
Numbers are deterministic per release-profile build on a given host.
Numeric drift across hosts is expected (the warm-up + black_box hints
keep iteration counts stable, not the underlying allocator's
small-alloc fast-path heuristics).