[F38] mxaccess-codec: counting-allocator bench harness + R12 baseline
Hand-rolled GlobalAlloc wrapper around System that tracks allocs +
bytes + deallocs via two atomics. Each scenario runs 10k iterations
after a 1k warm-up; output is a markdown table with allocs/op,
bytes/op, deallocs/op.
Why hand-rolled (not dhat/criterion): R12 gates on a single number
("< 5 allocs/write"). dhat is heap-profiling-oriented (call-stack
attribution, JSON snapshots); criterion measures wall-clock latency
which is reported-but-not-gated per 60-roadmap.md:104. A 50-line
GlobalAlloc + atomic counters is the simplest thing that answers
the gate.
Run: `cargo bench -p mxaccess-codec`
Baseline numbers (release, Windows x64):
- Bool write: 1.00 allocs/op
- Int32 write: 2.00 allocs/op
- Float32 write: 2.00 allocs/op
- Float64 write: 2.00 allocs/op
- String write: 4.00 allocs/op (5-char string)
- Handle from_names: 2.00 allocs/op
- DataUpdate decode: 1.00 alloc/op
R12's < 5 allocs/write target is **already met** across the proven
matrix without any zero-copy work. The bench gates on this — any
write_message::encode scenario at >= 5 allocs/op exits the harness
with code 1.
Companion: `design/M6-bench-baseline.md` documents the numbers,
explains the per-scenario breakdown, and tightens F39's scope from
"hit the target" to "nice-to-have optimisations" (BytesMut output
buffer, name-signature cache, session-level scratch pool).
Workspace: 759 tests still pass; clippy --benches clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,69 @@
|
||||
# M6 — `mxaccess-codec` allocation-count baseline
|
||||
|
||||
Source: `cargo bench -p mxaccess-codec` (commit recording this file).
|
||||
Harness: `crates/mxaccess-codec/benches/alloc_count.rs` — a thin
|
||||
`GlobalAlloc` wrapper that increments two atomics on every `alloc` /
|
||||
`dealloc` call, then runs each scenario for 10k iterations after a
|
||||
1k-iteration warm-up.
|
||||
|
||||
## Target (per `70-risks-and-open-questions.md` R12)
|
||||
|
||||
> Aim for < 5 allocations per write at steady state.
|
||||
|
||||
The bench gates on this: any `write_message::encode` scenario at
|
||||
≥ 5 allocs/op causes the binary to exit with code 1.
|
||||
|
||||
## Baseline (release profile, Windows x64)
|
||||
|
||||
| scenario | iters | allocs/op | bytes/op | deallocs/op |
|
||||
|-------------------------------------------|--------:|----------:|---------:|------------:|
|
||||
| `write_message::encode` (Int32) | 10,000 | 2.00 | 44 | 2.00 |
|
||||
| `write_message::encode` (Float32) | 10,000 | 2.00 | 44 | 2.00 |
|
||||
| `write_message::encode` (Float64) | 10,000 | 2.00 | 52 | 2.00 |
|
||||
| `write_message::encode` (Boolean) | 10,000 | 1.00 | 37 | 1.00 |
|
||||
| `write_message::encode` (String, 5 chars) | 10,000 | 4.00 | 92 | 4.00 |
|
||||
| `MxReferenceHandle::from_names` | 10,000 | 2.00 | 22 | 2.00 |
|
||||
| `NmxSubscriptionMessage::parse_inner` | 10,000 | 1.00 | 72 | 1.00 |
|
||||
| (DataUpdate, Int32) | | | | |
|
||||
|
||||
## Read
|
||||
|
||||
R12's < 5 allocs/write target is **already met** across the proven matrix:
|
||||
|
||||
- Scalar writes (Bool, Int32, Float32, Float64) sit at 1–2 allocs/op.
|
||||
The two allocs come from (1) the encoder's `Vec<u8>` output buffer
|
||||
and (2) an internal scratch buffer in the value-encode path.
|
||||
- String writes hit 4 allocs/op (output buffer, UTF-16LE conversion
|
||||
buffer, the inner-length wrapper, and one more downstream).
|
||||
- `MxReferenceHandle::from_names` allocates twice (one per
|
||||
`compute_name_signature` call — UTF-16LE buffer for each name).
|
||||
- `NmxSubscriptionMessage::parse_inner` allocates once for the
|
||||
`records: Vec<NmxSubscriptionRecord>` collection.
|
||||
|
||||
## Implications for F39
|
||||
|
||||
F39 (zero-copy pass) was scoped as the work to *hit* the R12 target.
|
||||
With the target already met, F39's scope tightens to:
|
||||
|
||||
- Move the encoder's output buffer to `bytes::BytesMut` so consumers
|
||||
can split it without copying. Doesn't reduce alloc count but
|
||||
improves downstream zero-copy on the wire-write path.
|
||||
- Cache the per-handle UTF-16LE name conversion (the two
|
||||
`compute_name_signature` allocs per `from_names`) inside
|
||||
`MxReferenceHandle` if the same name is registered repeatedly.
|
||||
- Pool the per-frame scratch buffer at the session level so the
|
||||
per-write count drops from 2 → 1 on hot paths.
|
||||
|
||||
These are nice-to-have optimisations rather than R12 blockers.
|
||||
|
||||
## Reproducing
|
||||
|
||||
```powershell
|
||||
cd rust
|
||||
cargo bench -p mxaccess-codec
|
||||
```
|
||||
|
||||
Numbers are deterministic per release-profile build on a given host.
|
||||
Numeric drift across hosts is expected (the warm-up + black_box hints
|
||||
keep iteration counts stable, not the underlying allocator's
|
||||
small-alloc fast-path heuristics).
|
||||
Reference in New Issue
Block a user