Files
mxaccess/design/M6-bench-baseline.md
T
Joseph Doherty 71c69b80c6 [F38] mxaccess-codec: counting-allocator bench harness + R12 baseline
Hand-rolled GlobalAlloc wrapper around System that tracks allocs +
bytes + deallocs via two atomics. Each scenario runs 10k iterations
after a 1k warm-up; output is a markdown table with allocs/op,
bytes/op, deallocs/op.

Why hand-rolled (not dhat/criterion): R12 gates on a single number
("< 5 allocs/write"). dhat is heap-profiling-oriented (call-stack
attribution, JSON snapshots); criterion measures wall-clock latency
which is reported-but-not-gated per 60-roadmap.md:104. A 50-line
GlobalAlloc + atomic counters is the simplest thing that answers
the gate.

Run: `cargo bench -p mxaccess-codec`

Baseline numbers (release, Windows x64):
- Bool write:    1.00 allocs/op
- Int32 write:   2.00 allocs/op
- Float32 write: 2.00 allocs/op
- Float64 write: 2.00 allocs/op
- String write:  4.00 allocs/op (5-char string)
- Handle from_names: 2.00 allocs/op
- DataUpdate decode: 1.00 alloc/op

R12's < 5 allocs/write target is **already met** across the proven
matrix without any zero-copy work. The bench gates on this — any
write_message::encode scenario at >= 5 allocs/op exits the harness
with code 1.

Companion: `design/M6-bench-baseline.md` documents the numbers,
explains the per-scenario breakdown, and tightens F39's scope from
"hit the target" to "nice-to-have optimisations" (BytesMut output
buffer, name-signature cache, session-level scratch pool).

Workspace: 759 tests still pass; clippy --benches clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:45:33 -04:00

70 lines
3.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# M6 — `mxaccess-codec` allocation-count baseline
Source: `cargo bench -p mxaccess-codec` (commit recording this file).
Harness: `crates/mxaccess-codec/benches/alloc_count.rs` — a thin
`GlobalAlloc` wrapper that increments two atomics on every `alloc` /
`dealloc` call, then runs each scenario for 10k iterations after a
1k-iteration warm-up.
## Target (per `70-risks-and-open-questions.md` R12)
> Aim for < 5 allocations per write at steady state.
The bench gates on this: any `write_message::encode` scenario at
≥ 5 allocs/op causes the binary to exit with code 1.
## Baseline (release profile, Windows x64)
| scenario | iters | allocs/op | bytes/op | deallocs/op |
|-------------------------------------------|--------:|----------:|---------:|------------:|
| `write_message::encode` (Int32) | 10,000 | 2.00 | 44 | 2.00 |
| `write_message::encode` (Float32) | 10,000 | 2.00 | 44 | 2.00 |
| `write_message::encode` (Float64) | 10,000 | 2.00 | 52 | 2.00 |
| `write_message::encode` (Boolean) | 10,000 | 1.00 | 37 | 1.00 |
| `write_message::encode` (String, 5 chars) | 10,000 | 4.00 | 92 | 4.00 |
| `MxReferenceHandle::from_names` | 10,000 | 2.00 | 22 | 2.00 |
| `NmxSubscriptionMessage::parse_inner` | 10,000 | 1.00 | 72 | 1.00 |
| (DataUpdate, Int32) | | | | |
## Read
R12's < 5 allocs/write target is **already met** across the proven matrix:
- Scalar writes (Bool, Int32, Float32, Float64) sit at 12 allocs/op.
The two allocs come from (1) the encoder's `Vec<u8>` output buffer
and (2) an internal scratch buffer in the value-encode path.
- String writes hit 4 allocs/op (output buffer, UTF-16LE conversion
buffer, the inner-length wrapper, and one more downstream).
- `MxReferenceHandle::from_names` allocates twice (one per
`compute_name_signature` call — UTF-16LE buffer for each name).
- `NmxSubscriptionMessage::parse_inner` allocates once for the
`records: Vec<NmxSubscriptionRecord>` collection.
## Implications for F39
F39 (zero-copy pass) was scoped as the work to *hit* the R12 target.
With the target already met, F39's scope tightens to:
- Move the encoder's output buffer to `bytes::BytesMut` so consumers
can split it without copying. Doesn't reduce alloc count but
improves downstream zero-copy on the wire-write path.
- Cache the per-handle UTF-16LE name conversion (the two
`compute_name_signature` allocs per `from_names`) inside
`MxReferenceHandle` if the same name is registered repeatedly.
- Pool the per-frame scratch buffer at the session level so the
per-write count drops from 2 → 1 on hot paths.
These are nice-to-have optimisations rather than R12 blockers.
## Reproducing
```powershell
cd rust
cargo bench -p mxaccess-codec
```
Numbers are deterministic per release-profile build on a given host.
Numeric drift across hosts is expected (the warm-up + black_box hints
keep iteration counts stable, not the underlying allocator's
small-alloc fast-path heuristics).