[F52.1] mxaccess-codec: BytesMut output buffer for write encoder

Adds `write_message::encode_to_bytes_mut` (and the timestamped variant)
returning a freshly-allocated `BytesMut`. Allocation count is identical
to `encode` (2 allocs/op for fixed-width scalars); the benefit is
downstream — consumers can `BytesMut::split_to` / `freeze` and forward
the body bytes to a wire-level sink without an intermediate copy.

The body builders (`encode_boolean` / `encode_fixed` / `encode_variable`
/ `encode_array`) were refactored to fill a pre-sized `&mut [u8]`
rather than each allocating their own `Vec<u8>`. The dispatcher
computes the body size up front via small `*_body_size` helpers and
resizes the destination buffer (Vec or BytesMut) once. This is also
the prerequisite refactor for F52.3.

Bench delta in `design/M6-bench-baseline.md` § F52.1; existing
`encode` row unchanged at 2 allocs/op. All 265 round-trip tests
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-06 22:46:02 -04:00
parent c7505f9570
commit 4e76b44391
6 changed files with 385 additions and 95 deletions
+36 -10
View File
@@ -15,16 +15,17 @@ The bench gates on this: any `write_message::encode` scenario at
## Baseline (release profile, Windows x64)
| scenario | iters | allocs/op | bytes/op | deallocs/op |
|-------------------------------------------|--------:|----------:|---------:|------------:|
| `write_message::encode` (Int32) | 10,000 | 2.00 | 44 | 2.00 |
| `write_message::encode` (Float32) | 10,000 | 2.00 | 44 | 2.00 |
| `write_message::encode` (Float64) | 10,000 | 2.00 | 52 | 2.00 |
| `write_message::encode` (Boolean) | 10,000 | 1.00 | 37 | 1.00 |
| `write_message::encode` (String, 5 chars) | 10,000 | 4.00 | 92 | 4.00 |
| `MxReferenceHandle::from_names` | 10,000 | 2.00 | 22 | 2.00 |
| `NmxSubscriptionMessage::parse_inner` | 10,000 | 1.00 | 72 | 1.00 |
| (DataUpdate, Int32) | | | | |
| scenario | iters | allocs/op | bytes/op | deallocs/op |
|------------------------------------------------|--------:|----------:|---------:|------------:|
| `write_message::encode` (Int32) | 10,000 | 2.00 | 44 | 2.00 |
| `write_message::encode` (Float32) | 10,000 | 2.00 | 44 | 2.00 |
| `write_message::encode` (Float64) | 10,000 | 2.00 | 52 | 2.00 |
| `write_message::encode` (Boolean) | 10,000 | 1.00 | 37 | 1.00 |
| `write_message::encode` (String, 5 chars) | 10,000 | 4.00 | 92 | 4.00 |
| `write_message::encode_to_bytes_mut` (Int32) | 10,000 | 2.00 | 44 | 2.00 |
| `MxReferenceHandle::from_names` | 10,000 | 2.00 | 22 | 2.00 |
| `NmxSubscriptionMessage::parse_inner` | 10,000 | 1.00 | 72 | 1.00 |
| (DataUpdate, Int32) | | | | |
## Read
@@ -56,6 +57,31 @@ With the target already met, F39's scope tightens to:
These are nice-to-have optimisations rather than R12 blockers.
## F52 deltas
F52 split the three F39 sub-tasks into their own commits. Each
optimisation lands with a before/after row in this section.
### F52.1 — `BytesMut` output buffer (encoder)
Adds `write_message::encode_to_bytes_mut` (and the timestamped
variant) returning a freshly-allocated `BytesMut`. Allocation count
is **identical** to the existing `encode` path — the benefit is
downstream: consumers can `BytesMut::split_to` / `freeze` and forward
the body bytes to a wire-level sink without an intermediate copy.
| scenario | before (allocs/op) | after (allocs/op) |
|----------------------------------------------|-------------------:|------------------:|
| `write_message::encode` (Int32) | 2.00 | 2.00 |
| `write_message::encode_to_bytes_mut` (Int32) | — | 2.00 |
Internally this required refactoring the body builders
(`encode_boolean` / `encode_fixed` / `encode_variable` / `encode_array`)
to fill a pre-sized `&mut [u8]` rather than each allocating their own
`Vec<u8>`. The dispatcher computes the body size up front via small
`*_body_size` helpers and resizes the destination buffer (Vec or
BytesMut) once. This is also the prerequisite refactor for F52.3.
## Reproducing
```powershell
+1 -1
View File
@@ -64,7 +64,7 @@ Array tags (`TestIntArray`, `TestBoolArray`, etc.) read live as `type_id=0 lengt
**Source:** `design/M6-bench-baseline.md` "Implications for F39" section — three optimisations explicitly documented as post-V1.
**Scope.** Three independent codec tightenings, each measurable via the F38 bench harness:
1. **`bytes::BytesMut` output buffer** on the encoder side. Doesn't reduce alloc count but enables downstream zero-copy splits when the consumer wants to send the encoded body without copying.
1. **`bytes::BytesMut` output buffer** on the encoder side. Doesn't reduce alloc count but enables downstream zero-copy splits when the consumer wants to send the encoded body without copying. ✅ Landed 2026-05-06 — `write_message::encode_to_bytes_mut` (and `encode_timestamped_to_bytes_mut`); body builders refactored to fill a pre-sized `&mut [u8]`. Bench delta in `design/M6-bench-baseline.md` § F52.1.
2. **Per-handle name-signature cache** in `MxReferenceHandle::from_names`. Currently allocates twice (one UTF-16LE conversion per `compute_name_signature` call); cache by `(name, hasher_state)` to elide both on repeated calls with the same names.
3. **Session-level scratch pool** for the per-write encode buffer. Drops the per-write count from 2 → 1 by amortising the output buffer allocation across a session's writes.