Commit Graph

4 Commits

Author SHA1 Message Date
Joseph Doherty ceeaeefa71 [F52.3] mxaccess-codec: caller-supplied scratch buffer for write encoder
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Adds `write_message::encode_into_bytes_mut` (and the timestamped
variant) which writes the encoded body into a caller-supplied
`BytesMut`. The buffer is cleared and resized in place each call;
once it has grown to the largest body the session will produce, it
allocates nothing further.

A session that holds a single `BytesMut` and reuses it across writes:

  - Int32 / Float32 / Float64: 2 → 1 allocs/op
    (only the `encode_scalar_value` scratch `Vec<u8>` remains)
  - Boolean: 1 → 0 allocs/op
    (no per-value scratch — the literal payload is a stack `[u8; 4]`)

Bench delta in `design/M6-bench-baseline.md` § F52.3. The
`encode_scalar_value` Vec is the remaining 1 alloc/op for fixed-width
scalars; eliminating it would require inlining the LE-bytes write
into the body slice (left for a follow-up since the F52 spec only
asks for 2 → 1).

Resolves F52 (all three optimisations landed: 4e76b44 F52.1,
a0fa5be F52.2, this commit F52.3). Existing `encode` / `encode_to_bytes_mut`
public surface unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:53:07 -04:00
Joseph Doherty a0fa5bedfd [F52.2] mxaccess-codec: thread-local name-signature cache
Adds a thread-local `HashMap<String, u16>` cache inside
`compute_name_signature`. Repeated calls with the same name (the hot
path inside `MxReferenceHandle::from_names`) skip the `to_lowercase`
allocation and the CRC-16/IBM walk entirely. Bounded at 1024 entries
per thread; on overflow the cache is cleared rather than evicted LRU
— any sane workload re-fills only the names it actively uses.

`MxReferenceHandle::from_names` drops from 2 → 0 allocs/op once warm
(bench delta in `design/M6-bench-baseline.md` § F52.2). Cold-path
behaviour is unchanged: first call with a new name still pays the
`to_lowercase` + cache-key `String` allocations.

Two new tests pin the cache: cache-hit returns the same value as
cold-compute, and cache overflow doesn't break correctness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:50:07 -04:00
Joseph Doherty 4e76b44391 [F52.1] mxaccess-codec: BytesMut output buffer for write encoder
Adds `write_message::encode_to_bytes_mut` (and the timestamped variant)
returning a freshly-allocated `BytesMut`. Allocation count is identical
to `encode` (2 allocs/op for fixed-width scalars); the benefit is
downstream — consumers can `BytesMut::split_to` / `freeze` and forward
the body bytes to a wire-level sink without an intermediate copy.

The body builders (`encode_boolean` / `encode_fixed` / `encode_variable`
/ `encode_array`) were refactored to fill a pre-sized `&mut [u8]`
rather than each allocating their own `Vec<u8>`. The dispatcher
computes the body size up front via small `*_body_size` helpers and
resizes the destination buffer (Vec or BytesMut) once. This is also
the prerequisite refactor for F52.3.

Bench delta in `design/M6-bench-baseline.md` § F52.1; existing
`encode` row unchanged at 2 allocs/op. All 265 round-trip tests
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:46:02 -04:00
Joseph Doherty 71c69b80c6 [F38] mxaccess-codec: counting-allocator bench harness + R12 baseline
Hand-rolled GlobalAlloc wrapper around System that tracks allocs +
bytes + deallocs via two atomics. Each scenario runs 10k iterations
after a 1k warm-up; output is a markdown table with allocs/op,
bytes/op, deallocs/op.

Why hand-rolled (not dhat/criterion): R12 gates on a single number
("< 5 allocs/write"). dhat is heap-profiling-oriented (call-stack
attribution, JSON snapshots); criterion measures wall-clock latency
which is reported-but-not-gated per 60-roadmap.md:104. A 50-line
GlobalAlloc + atomic counters is the simplest thing that answers
the gate.

Run: `cargo bench -p mxaccess-codec`

Baseline numbers (release, Windows x64):
- Bool write:    1.00 allocs/op
- Int32 write:   2.00 allocs/op
- Float32 write: 2.00 allocs/op
- Float64 write: 2.00 allocs/op
- String write:  4.00 allocs/op (5-char string)
- Handle from_names: 2.00 allocs/op
- DataUpdate decode: 1.00 alloc/op

R12's < 5 allocs/write target is **already met** across the proven
matrix without any zero-copy work. The bench gates on this — any
write_message::encode scenario at >= 5 allocs/op exits the harness
with code 1.

Companion: `design/M6-bench-baseline.md` documents the numbers,
explains the per-scenario breakdown, and tightens F39's scope from
"hit the target" to "nice-to-have optimisations" (BytesMut output
buffer, name-signature cache, session-level scratch pool).

Workspace: 759 tests still pass; clippy --benches clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:45:33 -04:00