[F52.2] mxaccess-codec: thread-local name-signature cache
Adds a thread-local `HashMap<String, u16>` cache inside `compute_name_signature`. Repeated calls with the same name (the hot path inside `MxReferenceHandle::from_names`) skip the `to_lowercase` allocation and the CRC-16/IBM walk entirely. Bounded at 1024 entries per thread; on overflow the cache is cleared rather than evicted LRU — any sane workload re-fills only the names it actively uses. `MxReferenceHandle::from_names` drops from 2 → 0 allocs/op once warm (bench delta in `design/M6-bench-baseline.md` § F52.2). Cold-path behaviour is unchanged: first call with a new name still pays the `to_lowercase` + cache-key `String` allocations. Two new tests pin the cache: cache-hit returns the same value as cold-compute, and cache overflow doesn't break correctness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -23,7 +23,7 @@ The bench gates on this: any `write_message::encode` scenario at
|
||||
| `write_message::encode` (Boolean) | 10,000 | 1.00 | 37 | 1.00 |
|
||||
| `write_message::encode` (String, 5 chars) | 10,000 | 4.00 | 92 | 4.00 |
|
||||
| `write_message::encode_to_bytes_mut` (Int32) | 10,000 | 2.00 | 44 | 2.00 |
|
||||
| `MxReferenceHandle::from_names` | 10,000 | 2.00 | 22 | 2.00 |
|
||||
| `MxReferenceHandle::from_names` (F52.2) | 10,000 | 0.00 | 0 | 0.00 |
|
||||
| `NmxSubscriptionMessage::parse_inner` | 10,000 | 1.00 | 72 | 1.00 |
|
||||
| (DataUpdate, Int32) | | | | |
|
||||
|
||||
@@ -82,6 +82,23 @@ to fill a pre-sized `&mut [u8]` rather than each allocating their own
|
||||
`*_body_size` helpers and resizes the destination buffer (Vec or
|
||||
BytesMut) once. This is also the prerequisite refactor for F52.3.
|
||||
|
||||
### F52.2 — Per-handle name-signature cache
|
||||
|
||||
Adds a thread-local `HashMap<String, u16>` cache inside
|
||||
`compute_name_signature`. Repeated calls with the same name (the hot
|
||||
path inside `MxReferenceHandle::from_names` when handles are
|
||||
constructed many times) skip the `to_lowercase` allocation entirely.
|
||||
Capped at 1024 entries; on overflow the thread's cache is cleared.
|
||||
|
||||
| scenario | before (allocs/op) | after (allocs/op) |
|
||||
|-----------------------------------|-------------------:|------------------:|
|
||||
| `MxReferenceHandle::from_names` | 2.00 | 0.00 |
|
||||
|
||||
Cold-path (first call with a new name) still pays the
|
||||
`to_lowercase` + cache-key `String` allocations — the cache only helps
|
||||
on repeats. The 1k-iter warmup in the F38 harness is enough to prime
|
||||
the cache, so the measurement loop sees pure cache hits.
|
||||
|
||||
## Reproducing
|
||||
|
||||
```powershell
|
||||
|
||||
+1
-1
@@ -65,7 +65,7 @@ Array tags (`TestIntArray`, `TestBoolArray`, etc.) read live as `type_id=0 lengt
|
||||
|
||||
**Scope.** Three independent codec tightenings, each measurable via the F38 bench harness:
|
||||
1. **`bytes::BytesMut` output buffer** on the encoder side. Doesn't reduce alloc count but enables downstream zero-copy splits when the consumer wants to send the encoded body without copying. ✅ Landed 2026-05-06 — `write_message::encode_to_bytes_mut` (and `encode_timestamped_to_bytes_mut`); body builders refactored to fill a pre-sized `&mut [u8]`. Bench delta in `design/M6-bench-baseline.md` § F52.1.
|
||||
2. **Per-handle name-signature cache** in `MxReferenceHandle::from_names`. Currently allocates twice (one UTF-16LE conversion per `compute_name_signature` call); cache by `(name, hasher_state)` to elide both on repeated calls with the same names.
|
||||
2. **Per-handle name-signature cache** in `MxReferenceHandle::from_names`. Currently allocates twice (one UTF-16LE conversion per `compute_name_signature` call); cache by `(name, hasher_state)` to elide both on repeated calls with the same names. ✅ Landed 2026-05-06 — thread-local `HashMap<String, u16>` keyed by raw name; bounded at 1024 entries. `MxReferenceHandle::from_names` drops 2 → 0 allocs/op once warm. Bench delta in `design/M6-bench-baseline.md` § F52.2.
|
||||
3. **Session-level scratch pool** for the per-write encode buffer. Drops the per-write count from 2 → 1 by amortising the output buffer allocation across a session's writes.
|
||||
|
||||
**Definition of done:**
|
||||
|
||||
Reference in New Issue
Block a user