[F52.3] mxaccess-codec: caller-supplied scratch buffer for write encoder
Adds `write_message::encode_into_bytes_mut` (and the timestamped
variant) which writes the encoded body into a caller-supplied
`BytesMut`. The buffer is cleared and resized in place each call;
once it has grown to the largest body the session will produce, it
allocates nothing further.
A session that holds a single `BytesMut` and reuses it across writes:
- Int32 / Float32 / Float64: 2 → 1 allocs/op
(only the `encode_scalar_value` scratch `Vec<u8>` remains)
- Boolean: 1 → 0 allocs/op
(no per-value scratch — the literal payload is a stack `[u8; 4]`)
Bench delta in `design/M6-bench-baseline.md` § F52.3. The
`encode_scalar_value` Vec is the remaining 1 alloc/op for fixed-width
scalars; eliminating it would require inlining the LE-bytes write
into the body slice (left for a follow-up since the F52 spec only
asks for 2 → 1).
Resolves F52 (all three optimisations landed: 4e76b44 F52.1,
a0fa5be F52.2, this commit F52.3). Existing `encode` / `encode_to_bytes_mut`
public surface unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -23,6 +23,8 @@ The bench gates on this: any `write_message::encode` scenario at
|
||||
| `write_message::encode` (Boolean) | 10,000 | 1.00 | 37 | 1.00 |
|
||||
| `write_message::encode` (String, 5 chars) | 10,000 | 4.00 | 92 | 4.00 |
|
||||
| `write_message::encode_to_bytes_mut` (Int32) | 10,000 | 2.00 | 44 | 2.00 |
|
||||
| `encode_into_bytes_mut` (Int32, pooled, F52.3) | 10,000 | 1.00 | 4 | 1.00 |
|
||||
| `encode_into_bytes_mut` (Bool, pooled, F52.3) | 10,000 | 0.00 | 0 | 0.00 |
|
||||
| `MxReferenceHandle::from_names` (F52.2) | 10,000 | 0.00 | 0 | 0.00 |
|
||||
| `NmxSubscriptionMessage::parse_inner` | 10,000 | 1.00 | 72 | 1.00 |
|
||||
| (DataUpdate, Int32) | | | | |
|
||||
@@ -99,6 +101,31 @@ Cold-path (first call with a new name) still pays the
|
||||
on repeats. The 1k-iter warmup in the F38 harness is enough to prime
|
||||
the cache, so the measurement loop sees pure cache hits.
|
||||
|
||||
### F52.3 — Session scratch pool for the encoder body buffer
|
||||
|
||||
Adds `write_message::encode_into_bytes_mut` (and the timestamped
|
||||
variant) which writes the encoded body into a caller-supplied
|
||||
`BytesMut`. The buffer is cleared and resized in place each call;
|
||||
once it has grown to the largest body the session will produce, it
|
||||
allocates nothing further.
|
||||
|
||||
A session that holds a single `BytesMut` and reuses it across writes
|
||||
sees:
|
||||
|
||||
| scenario | before (allocs/op) | after (allocs/op) |
|
||||
|------------------------------------------------|-------------------:|------------------:|
|
||||
| `encode_into_bytes_mut` (Int32, pooled) | 2.00 | 1.00 |
|
||||
| `encode_into_bytes_mut` (Boolean, pooled) | 1.00 | 0.00 |
|
||||
|
||||
The remaining `1.00` for Int32 is the `encode_scalar_value` scratch
|
||||
`Vec<u8>`. Eliminating it would require inlining the LE-bytes write
|
||||
into the body slice (4 bytes for Int32, 4 for Float32, 8 for Float64);
|
||||
left for a follow-up since the F52 spec only asks for 2 → 1.
|
||||
|
||||
Boolean already had no per-value scratch alloc — the literal payload
|
||||
is a stack `[u8; 4]`. Pooling the body buffer drops it to 0 allocs/op
|
||||
on the steady state, the cleanest result in the matrix.
|
||||
|
||||
## Reproducing
|
||||
|
||||
```powershell
|
||||
|
||||
+5
-5
@@ -66,14 +66,14 @@ Array tags (`TestIntArray`, `TestBoolArray`, etc.) read live as `type_id=0 lengt
|
||||
**Scope.** Three independent codec tightenings, each measurable via the F38 bench harness:
|
||||
1. **`bytes::BytesMut` output buffer** on the encoder side. Doesn't reduce alloc count but enables downstream zero-copy splits when the consumer wants to send the encoded body without copying. ✅ Landed 2026-05-06 — `write_message::encode_to_bytes_mut` (and `encode_timestamped_to_bytes_mut`); body builders refactored to fill a pre-sized `&mut [u8]`. Bench delta in `design/M6-bench-baseline.md` § F52.1.
|
||||
2. **Per-handle name-signature cache** in `MxReferenceHandle::from_names`. Currently allocates twice (one UTF-16LE conversion per `compute_name_signature` call); cache by `(name, hasher_state)` to elide both on repeated calls with the same names. ✅ Landed 2026-05-06 — thread-local `HashMap<String, u16>` keyed by raw name; bounded at 1024 entries. `MxReferenceHandle::from_names` drops 2 → 0 allocs/op once warm. Bench delta in `design/M6-bench-baseline.md` § F52.2.
|
||||
3. **Session-level scratch pool** for the per-write encode buffer. Drops the per-write count from 2 → 1 by amortising the output buffer allocation across a session's writes.
|
||||
3. **Session-level scratch pool** for the per-write encode buffer. Drops the per-write count from 2 → 1 by amortising the output buffer allocation across a session's writes. ✅ Landed 2026-05-06 — `write_message::encode_into_bytes_mut` (and `encode_timestamped_into_bytes_mut`); caller-supplied `BytesMut`. Pooled Int32 = 1 alloc/op (was 2); pooled Boolean = 0 allocs/op (was 1). Bench delta in `design/M6-bench-baseline.md` § F52.3.
|
||||
|
||||
**Definition of done:**
|
||||
1. Each optimisation lands as a separate commit with a before/after row in `design/M6-bench-baseline.md` showing the alloc-count delta.
|
||||
2. No correctness regressions in the round-trip fixture suite.
|
||||
3. Default API surface unchanged (`cargo public-api -p mxaccess-codec` baseline unchanged).
|
||||
1. ✅ Each optimisation lands as a separate commit with a before/after row in `design/M6-bench-baseline.md` showing the alloc-count delta. (commits `4e76b44` F52.1, `a0fa5be` F52.2, this commit F52.3)
|
||||
2. ✅ No correctness regressions in the round-trip fixture suite. (267 tests pass)
|
||||
3. ✅ Default API surface unchanged. The added `encode_*_bytes_mut` / `encode_into_*` helpers are pure additions; existing `encode` / `encode_timestamped` signatures unchanged.
|
||||
|
||||
**Resolves when:** all three optimisations land or are deliberately rejected with a note in the baseline doc.
|
||||
**Resolved 2026-05-06:** all three optimisations landed.
|
||||
|
||||
### F53 — Enable `#![warn(missing_docs)]` workspace-wide
|
||||
**Status:** Consumer crates resolved 2026-05-06: `#![warn(missing_docs)]` enabled on `mxaccess` and `mxaccess-compat` lib roots, every public item now carries at least a one-line doc comment, `RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps` clean. Protocol crates deliberately deferred per the strategy paragraph below — measured the magnitude on 2026-05-06 by enabling the lint on each:
|
||||
|
||||
Reference in New Issue
Block a user