[F38] mxaccess-codec: counting-allocator bench harness + R12 baseline

Hand-rolled GlobalAlloc wrapper around System that tracks allocs + bytes + deallocs via two atomics. Each scenario runs 10k iterations after a 1k warm-up; output is a markdown table with allocs/op, bytes/op, deallocs/op. Why hand-rolled (not dhat/criterion): R12 gates on a single number ("< 5 allocs/write"). dhat is heap-profiling-oriented (call-stack attribution, JSON snapshots); criterion measures wall-clock latency which is reported-but-not-gated per 60-roadmap.md:104. A 50-line GlobalAlloc + atomic counters is the simplest thing that answers the gate. Run: `cargo bench -p mxaccess-codec` Baseline numbers (release, Windows x64): - Bool write: 1.00 allocs/op - Int32 write: 2.00 allocs/op - Float32 write: 2.00 allocs/op - Float64 write: 2.00 allocs/op - String write: 4.00 allocs/op (5-char string) - Handle from_names: 2.00 allocs/op - DataUpdate decode: 1.00 alloc/op R12's < 5 allocs/write target is **already met** across the proven matrix without any zero-copy work. The bench gates on this — any write_message::encode scenario at >= 5 allocs/op exits the harness with code 1. Companion: `design/M6-bench-baseline.md` documents the numbers, explains the per-scenario breakdown, and tightens F39's scope from "hit the target" to "nice-to-have optimisations" (BytesMut output buffer, name-signature cache, session-level scratch pool). Workspace: 759 tests still pass; clippy --benches clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:45:33 -04:00
parent e79e289743
commit 71c69b80c6
3 changed files with 371 additions and 0 deletions
@@ -0,0 +1,69 @@
+# M6 — `mxaccess-codec` allocation-count baseline
+
+Source: `cargo bench -p mxaccess-codec` (commit recording this file).
+Harness: `crates/mxaccess-codec/benches/alloc_count.rs` — a thin
+`GlobalAlloc` wrapper that increments two atomics on every `alloc` /
+`dealloc` call, then runs each scenario for 10k iterations after a
+1k-iteration warm-up.
+
+## Target (per `70-risks-and-open-questions.md` R12)
+
+> Aim for < 5 allocations per write at steady state.
+
+The bench gates on this: any `write_message::encode` scenario at
+≥ 5 allocs/op causes the binary to exit with code 1.
+
+## Baseline (release profile, Windows x64)
+
+| scenario                                  |   iters | allocs/op | bytes/op | deallocs/op |
+|-------------------------------------------|--------:|----------:|---------:|------------:|
+| `write_message::encode` (Int32)           |  10,000 |      2.00 |       44 |        2.00 |
+| `write_message::encode` (Float32)         |  10,000 |      2.00 |       44 |        2.00 |
+| `write_message::encode` (Float64)         |  10,000 |      2.00 |       52 |        2.00 |
+| `write_message::encode` (Boolean)         |  10,000 |      1.00 |       37 |        1.00 |
+| `write_message::encode` (String, 5 chars) |  10,000 |      4.00 |       92 |        4.00 |
+| `MxReferenceHandle::from_names`           |  10,000 |      2.00 |       22 |        2.00 |
+| `NmxSubscriptionMessage::parse_inner`     |  10,000 |      1.00 |       72 |        1.00 |
+| (DataUpdate, Int32)                       |         |           |          |             |
+
+## Read
+
+R12's < 5 allocs/write target is **already met** across the proven matrix:
+
+- Scalar writes (Bool, Int32, Float32, Float64) sit at 1–2 allocs/op.
+  The two allocs come from (1) the encoder's `Vec<u8>` output buffer
+  and (2) an internal scratch buffer in the value-encode path.
+- String writes hit 4 allocs/op (output buffer, UTF-16LE conversion
+  buffer, the inner-length wrapper, and one more downstream).
+- `MxReferenceHandle::from_names` allocates twice (one per
+  `compute_name_signature` call — UTF-16LE buffer for each name).
+- `NmxSubscriptionMessage::parse_inner` allocates once for the
+  `records: Vec<NmxSubscriptionRecord>` collection.
+
+## Implications for F39
+
+F39 (zero-copy pass) was scoped as the work to *hit* the R12 target.
+With the target already met, F39's scope tightens to:
+
+- Move the encoder's output buffer to `bytes::BytesMut` so consumers
+  can split it without copying. Doesn't reduce alloc count but
+  improves downstream zero-copy on the wire-write path.
+- Cache the per-handle UTF-16LE name conversion (the two
+  `compute_name_signature` allocs per `from_names`) inside
+  `MxReferenceHandle` if the same name is registered repeatedly.
+- Pool the per-frame scratch buffer at the session level so the
+  per-write count drops from 2 → 1 on hot paths.
+
+These are nice-to-have optimisations rather than R12 blockers.
+
+## Reproducing
+
+```powershell
+cd rust
+cargo bench -p mxaccess-codec
+```
+
+Numbers are deterministic per release-profile build on a given host.
+Numeric drift across hosts is expected (the warm-up + black_box hints
+keep iteration counts stable, not the underlying allocator's
+small-alloc fast-path heuristics).