[F52.1] mxaccess-codec: BytesMut output buffer for write encoder

Adds `write_message::encode_to_bytes_mut` (and the timestamped variant) returning a freshly-allocated `BytesMut`. Allocation count is identical to `encode` (2 allocs/op for fixed-width scalars); the benefit is downstream — consumers can `BytesMut::split_to` / `freeze` and forward the body bytes to a wire-level sink without an intermediate copy. The body builders (`encode_boolean` / `encode_fixed` / `encode_variable` / `encode_array`) were refactored to fill a pre-sized `&mut [u8]` rather than each allocating their own `Vec<u8>`. The dispatcher computes the body size up front via small `*_body_size` helpers and resizes the destination buffer (Vec or BytesMut) once. This is also the prerequisite refactor for F52.3. Bench delta in `design/M6-bench-baseline.md` § F52.1; existing `encode` row unchanged at 2 allocs/op. All 265 round-trip tests unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:46:02 -04:00
parent c7505f9570
commit 4e76b44391
6 changed files with 385 additions and 95 deletions
@@ -15,16 +15,17 @@ The bench gates on this: any `write_message::encode` scenario at

 ## Baseline (release profile, Windows x64)

-| scenario                                  |   iters | allocs/op | bytes/op | deallocs/op |
-|-------------------------------------------|--------:|----------:|---------:|------------:|
-| `write_message::encode` (Int32)           |  10,000 |      2.00 |       44 |        2.00 |
-| `write_message::encode` (Float32)         |  10,000 |      2.00 |       44 |        2.00 |
-| `write_message::encode` (Float64)         |  10,000 |      2.00 |       52 |        2.00 |
-| `write_message::encode` (Boolean)         |  10,000 |      1.00 |       37 |        1.00 |
-| `write_message::encode` (String, 5 chars) |  10,000 |      4.00 |       92 |        4.00 |
-| `MxReferenceHandle::from_names`           |  10,000 |      2.00 |       22 |        2.00 |
-| `NmxSubscriptionMessage::parse_inner`     |  10,000 |      1.00 |       72 |        1.00 |
-| (DataUpdate, Int32)                       |         |           |          |             |
+| scenario                                       |   iters | allocs/op | bytes/op | deallocs/op |
+|------------------------------------------------|--------:|----------:|---------:|------------:|
+| `write_message::encode` (Int32)                |  10,000 |      2.00 |       44 |        2.00 |
+| `write_message::encode` (Float32)              |  10,000 |      2.00 |       44 |        2.00 |
+| `write_message::encode` (Float64)              |  10,000 |      2.00 |       52 |        2.00 |
+| `write_message::encode` (Boolean)              |  10,000 |      1.00 |       37 |        1.00 |
+| `write_message::encode` (String, 5 chars)      |  10,000 |      4.00 |       92 |        4.00 |
+| `write_message::encode_to_bytes_mut` (Int32)   |  10,000 |      2.00 |       44 |        2.00 |
+| `MxReferenceHandle::from_names`                |  10,000 |      2.00 |       22 |        2.00 |
+| `NmxSubscriptionMessage::parse_inner`          |  10,000 |      1.00 |       72 |        1.00 |
+| (DataUpdate, Int32)                            |         |           |          |             |

 ## Read

@@ -56,6 +57,31 @@ With the target already met, F39's scope tightens to:

 These are nice-to-have optimisations rather than R12 blockers.

+## F52 deltas
+
+F52 split the three F39 sub-tasks into their own commits. Each
+optimisation lands with a before/after row in this section.
+
+### F52.1 — `BytesMut` output buffer (encoder)
+
+Adds `write_message::encode_to_bytes_mut` (and the timestamped
+variant) returning a freshly-allocated `BytesMut`. Allocation count
+is **identical** to the existing `encode` path — the benefit is
+downstream: consumers can `BytesMut::split_to` / `freeze` and forward
+the body bytes to a wire-level sink without an intermediate copy.
+
+| scenario                                     | before (allocs/op) | after (allocs/op) |
+|----------------------------------------------|-------------------:|------------------:|
+| `write_message::encode` (Int32)              |               2.00 |              2.00 |
+| `write_message::encode_to_bytes_mut` (Int32) |                  — |              2.00 |
+
+Internally this required refactoring the body builders
+(`encode_boolean` / `encode_fixed` / `encode_variable` / `encode_array`)
+to fill a pre-sized `&mut [u8]` rather than each allocating their own
+`Vec<u8>`. The dispatcher computes the body size up front via small
+`*_body_size` helpers and resizes the destination buffer (Vec or
+BytesMut) once. This is also the prerequisite refactor for F52.3.
+
 ## Reproducing

 ```powershell
@@ -64,7 +64,7 @@ Array tags (`TestIntArray`, `TestBoolArray`, etc.) read live as `type_id=0 lengt
 **Source:** `design/M6-bench-baseline.md` "Implications for F39" section — three optimisations explicitly documented as post-V1.

 **Scope.** Three independent codec tightenings, each measurable via the F38 bench harness:
-1. **`bytes::BytesMut` output buffer** on the encoder side. Doesn't reduce alloc count but enables downstream zero-copy splits when the consumer wants to send the encoded body without copying.
+1. **`bytes::BytesMut` output buffer** on the encoder side. Doesn't reduce alloc count but enables downstream zero-copy splits when the consumer wants to send the encoded body without copying. ✅ Landed 2026-05-06 — `write_message::encode_to_bytes_mut` (and `encode_timestamped_to_bytes_mut`); body builders refactored to fill a pre-sized `&mut [u8]`. Bench delta in `design/M6-bench-baseline.md` § F52.1.
 2. **Per-handle name-signature cache** in `MxReferenceHandle::from_names`. Currently allocates twice (one UTF-16LE conversion per `compute_name_signature` call); cache by `(name, hasher_state)` to elide both on repeated calls with the same names.
 3. **Session-level scratch pool** for the per-write encode buffer. Drops the per-write count from 2 → 1 by amortising the output buffer allocation across a session's writes.