diff --git a/design/M6-bench-baseline.md b/design/M6-bench-baseline.md index 72b947f..b5df059 100644 --- a/design/M6-bench-baseline.md +++ b/design/M6-bench-baseline.md @@ -23,6 +23,8 @@ The bench gates on this: any `write_message::encode` scenario at | `write_message::encode` (Boolean) | 10,000 | 1.00 | 37 | 1.00 | | `write_message::encode` (String, 5 chars) | 10,000 | 4.00 | 92 | 4.00 | | `write_message::encode_to_bytes_mut` (Int32) | 10,000 | 2.00 | 44 | 2.00 | +| `encode_into_bytes_mut` (Int32, pooled, F52.3) | 10,000 | 1.00 | 4 | 1.00 | +| `encode_into_bytes_mut` (Bool, pooled, F52.3) | 10,000 | 0.00 | 0 | 0.00 | | `MxReferenceHandle::from_names` (F52.2) | 10,000 | 0.00 | 0 | 0.00 | | `NmxSubscriptionMessage::parse_inner` | 10,000 | 1.00 | 72 | 1.00 | | (DataUpdate, Int32) | | | | | @@ -99,6 +101,31 @@ Cold-path (first call with a new name) still pays the on repeats. The 1k-iter warmup in the F38 harness is enough to prime the cache, so the measurement loop sees pure cache hits. +### F52.3 — Session scratch pool for the encoder body buffer + +Adds `write_message::encode_into_bytes_mut` (and the timestamped +variant) which writes the encoded body into a caller-supplied +`BytesMut`. The buffer is cleared and resized in place each call; +once it has grown to the largest body the session will produce, it +allocates nothing further. + +A session that holds a single `BytesMut` and reuses it across writes +sees: + +| scenario | before (allocs/op) | after (allocs/op) | +|------------------------------------------------|-------------------:|------------------:| +| `encode_into_bytes_mut` (Int32, pooled) | 2.00 | 1.00 | +| `encode_into_bytes_mut` (Boolean, pooled) | 1.00 | 0.00 | + +The remaining `1.00` for Int32 is the `encode_scalar_value` scratch +`Vec`. Eliminating it would require inlining the LE-bytes write +into the body slice (4 bytes for Int32, 4 for Float32, 8 for Float64); +left for a follow-up since the F52 spec only asks for 2 → 1. + +Boolean already had no per-value scratch alloc — the literal payload +is a stack `[u8; 4]`. Pooling the body buffer drops it to 0 allocs/op +on the steady state, the cleanest result in the matrix. + ## Reproducing ```powershell diff --git a/design/followups.md b/design/followups.md index 1e88cad..bb804f5 100644 --- a/design/followups.md +++ b/design/followups.md @@ -66,14 +66,14 @@ Array tags (`TestIntArray`, `TestBoolArray`, etc.) read live as `type_id=0 lengt **Scope.** Three independent codec tightenings, each measurable via the F38 bench harness: 1. **`bytes::BytesMut` output buffer** on the encoder side. Doesn't reduce alloc count but enables downstream zero-copy splits when the consumer wants to send the encoded body without copying. ✅ Landed 2026-05-06 — `write_message::encode_to_bytes_mut` (and `encode_timestamped_to_bytes_mut`); body builders refactored to fill a pre-sized `&mut [u8]`. Bench delta in `design/M6-bench-baseline.md` § F52.1. 2. **Per-handle name-signature cache** in `MxReferenceHandle::from_names`. Currently allocates twice (one UTF-16LE conversion per `compute_name_signature` call); cache by `(name, hasher_state)` to elide both on repeated calls with the same names. ✅ Landed 2026-05-06 — thread-local `HashMap` keyed by raw name; bounded at 1024 entries. `MxReferenceHandle::from_names` drops 2 → 0 allocs/op once warm. Bench delta in `design/M6-bench-baseline.md` § F52.2. -3. **Session-level scratch pool** for the per-write encode buffer. Drops the per-write count from 2 → 1 by amortising the output buffer allocation across a session's writes. +3. **Session-level scratch pool** for the per-write encode buffer. Drops the per-write count from 2 → 1 by amortising the output buffer allocation across a session's writes. ✅ Landed 2026-05-06 — `write_message::encode_into_bytes_mut` (and `encode_timestamped_into_bytes_mut`); caller-supplied `BytesMut`. Pooled Int32 = 1 alloc/op (was 2); pooled Boolean = 0 allocs/op (was 1). Bench delta in `design/M6-bench-baseline.md` § F52.3. **Definition of done:** -1. Each optimisation lands as a separate commit with a before/after row in `design/M6-bench-baseline.md` showing the alloc-count delta. -2. No correctness regressions in the round-trip fixture suite. -3. Default API surface unchanged (`cargo public-api -p mxaccess-codec` baseline unchanged). +1. ✅ Each optimisation lands as a separate commit with a before/after row in `design/M6-bench-baseline.md` showing the alloc-count delta. (commits `4e76b44` F52.1, `a0fa5be` F52.2, this commit F52.3) +2. ✅ No correctness regressions in the round-trip fixture suite. (267 tests pass) +3. ✅ Default API surface unchanged. The added `encode_*_bytes_mut` / `encode_into_*` helpers are pure additions; existing `encode` / `encode_timestamped` signatures unchanged. -**Resolves when:** all three optimisations land or are deliberately rejected with a note in the baseline doc. +**Resolved 2026-05-06:** all three optimisations landed. ### F53 — Enable `#![warn(missing_docs)]` workspace-wide **Status:** Consumer crates resolved 2026-05-06: `#![warn(missing_docs)]` enabled on `mxaccess` and `mxaccess-compat` lib roots, every public item now carries at least a one-line doc comment, `RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps` clean. Protocol crates deliberately deferred per the strategy paragraph below — measured the magnitude on 2026-05-06 by enabling the lint on each: diff --git a/rust/crates/mxaccess-codec/benches/alloc_count.rs b/rust/crates/mxaccess-codec/benches/alloc_count.rs index 2c7b156..0b9ee7f 100644 --- a/rust/crates/mxaccess-codec/benches/alloc_count.rs +++ b/rust/crates/mxaccess-codec/benches/alloc_count.rs @@ -38,6 +38,7 @@ use std::alloc::{GlobalAlloc, Layout, System}; use std::sync::atomic::{AtomicU64, Ordering}; +use bytes::BytesMut; use mxaccess_codec::{ write_message, write_message::WriteValue, MxReferenceHandle, NmxSubscriptionMessage, }; @@ -214,6 +215,39 @@ fn bench_write_int32_bytes_mut() -> Row { }) } +// F52.3 — session-level scratch buffer. The caller supplies a `BytesMut` +// that is cleared and resized in place, so the body allocation is amortised +// across a session's writes. Drops the per-write count from 2 → 1 for +// fixed-width scalars (the remaining alloc is the per-value scratch buffer +// inside `encode_scalar_value`) and 1 → 0 for Boolean (no scalar scratch). +fn bench_write_int32_into_pooled() -> Row { + let handle = make_handle(); + let value = WriteValue::Int32(42); + let mut buf = BytesMut::new(); + measure( + "write_message::encode_into_bytes_mut (Int32, pooled)", + 10_000, + || { + write_message::encode_into_bytes_mut(&handle, &value, 0, 0, &mut buf).unwrap(); + std::hint::black_box(&buf); + }, + ) +} + +fn bench_write_bool_into_pooled() -> Row { + let handle = make_handle(); + let value = WriteValue::Boolean(true); + let mut buf = BytesMut::new(); + measure( + "write_message::encode_into_bytes_mut (Boolean, pooled)", + 10_000, + || { + write_message::encode_into_bytes_mut(&handle, &value, 0, 0, &mut buf).unwrap(); + std::hint::black_box(&buf); + }, + ) +} + fn bench_subscription_decode() -> Row { // Build a single-record DataUpdate body once; decode N times. let body = build_data_update_int32_body(42); @@ -275,6 +309,8 @@ fn main() { bench_write_bool(), bench_write_string(), bench_write_int32_bytes_mut(), + bench_write_int32_into_pooled(), + bench_write_bool_into_pooled(), bench_handle_from_names(), bench_subscription_decode(), ]; diff --git a/rust/crates/mxaccess-codec/src/write_message.rs b/rust/crates/mxaccess-codec/src/write_message.rs index 95fcbbe..438dea1 100644 --- a/rust/crates/mxaccess-codec/src/write_message.rs +++ b/rust/crates/mxaccess-codec/src/write_message.rs @@ -276,6 +276,29 @@ pub fn encode_to_bytes_mut( Ok(dst) } +/// Encode a normal write body (`0x37`) into a caller-supplied [`BytesMut`] +/// scratch buffer. Clears `dst` first, resizes it to fit the body, and fills +/// it via the standard codec path. +/// +/// Reusing the same `dst` across writes amortises the body allocation and +/// drops per-write alloc count from 2 → 1 for fixed-width scalars (and 1 → 0 +/// for Boolean) once the buffer is sized for the largest body the session +/// will produce. (F52.3 session scratch pool from +/// `design/M6-bench-baseline.md`.) +/// +/// # Errors +/// +/// See [`encode`]. +pub fn encode_into_bytes_mut( + handle: &MxReferenceHandle, + value: &WriteValue, + write_index: i32, + client_token: u32, + dst: &mut BytesMut, +) -> Result<(), CodecError> { + encode_inner_into(handle, value, write_index, client_token, None, dst) +} + /// Encode a `Write2` (timestamped) body. Mirrors `NmxWriteMessage.EncodeTimestamped` /// (`NmxWriteMessage.cs:36-56`). /// @@ -326,6 +349,29 @@ pub fn encode_timestamped_to_bytes_mut( Ok(dst) } +/// `Write2` (timestamped) variant of [`encode_into_bytes_mut`]. +/// +/// # Errors +/// +/// See [`encode`]. +pub fn encode_timestamped_into_bytes_mut( + handle: &MxReferenceHandle, + value: &WriteValue, + timestamp_filetime: i64, + write_index: i32, + client_token: u32, + dst: &mut BytesMut, +) -> Result<(), CodecError> { + encode_inner_into( + handle, + value, + write_index, + client_token, + Some(timestamp_filetime), + dst, + ) +} + fn encode_inner( handle: &MxReferenceHandle, value: &WriteValue,