[F38] mxaccess-codec: counting-allocator bench harness + R12 baseline
Hand-rolled GlobalAlloc wrapper around System that tracks allocs +
bytes + deallocs via two atomics. Each scenario runs 10k iterations
after a 1k warm-up; output is a markdown table with allocs/op,
bytes/op, deallocs/op.
Why hand-rolled (not dhat/criterion): R12 gates on a single number
("< 5 allocs/write"). dhat is heap-profiling-oriented (call-stack
attribution, JSON snapshots); criterion measures wall-clock latency
which is reported-but-not-gated per 60-roadmap.md:104. A 50-line
GlobalAlloc + atomic counters is the simplest thing that answers
the gate.
Run: `cargo bench -p mxaccess-codec`
Baseline numbers (release, Windows x64):
- Bool write: 1.00 allocs/op
- Int32 write: 2.00 allocs/op
- Float32 write: 2.00 allocs/op
- Float64 write: 2.00 allocs/op
- String write: 4.00 allocs/op (5-char string)
- Handle from_names: 2.00 allocs/op
- DataUpdate decode: 1.00 alloc/op
R12's < 5 allocs/write target is **already met** across the proven
matrix without any zero-copy work. The bench gates on this — any
write_message::encode scenario at >= 5 allocs/op exits the harness
with code 1.
Companion: `design/M6-bench-baseline.md` documents the numbers,
explains the per-scenario breakdown, and tightens F39's scope from
"hit the target" to "nice-to-have optimisations" (BytesMut output
buffer, name-signature cache, session-level scratch pool).
Workspace: 759 tests still pass; clippy --benches clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,69 @@
|
||||
# M6 — `mxaccess-codec` allocation-count baseline
|
||||
|
||||
Source: `cargo bench -p mxaccess-codec` (commit recording this file).
|
||||
Harness: `crates/mxaccess-codec/benches/alloc_count.rs` — a thin
|
||||
`GlobalAlloc` wrapper that increments two atomics on every `alloc` /
|
||||
`dealloc` call, then runs each scenario for 10k iterations after a
|
||||
1k-iteration warm-up.
|
||||
|
||||
## Target (per `70-risks-and-open-questions.md` R12)
|
||||
|
||||
> Aim for < 5 allocations per write at steady state.
|
||||
|
||||
The bench gates on this: any `write_message::encode` scenario at
|
||||
≥ 5 allocs/op causes the binary to exit with code 1.
|
||||
|
||||
## Baseline (release profile, Windows x64)
|
||||
|
||||
| scenario | iters | allocs/op | bytes/op | deallocs/op |
|
||||
|-------------------------------------------|--------:|----------:|---------:|------------:|
|
||||
| `write_message::encode` (Int32) | 10,000 | 2.00 | 44 | 2.00 |
|
||||
| `write_message::encode` (Float32) | 10,000 | 2.00 | 44 | 2.00 |
|
||||
| `write_message::encode` (Float64) | 10,000 | 2.00 | 52 | 2.00 |
|
||||
| `write_message::encode` (Boolean) | 10,000 | 1.00 | 37 | 1.00 |
|
||||
| `write_message::encode` (String, 5 chars) | 10,000 | 4.00 | 92 | 4.00 |
|
||||
| `MxReferenceHandle::from_names` | 10,000 | 2.00 | 22 | 2.00 |
|
||||
| `NmxSubscriptionMessage::parse_inner` | 10,000 | 1.00 | 72 | 1.00 |
|
||||
| (DataUpdate, Int32) | | | | |
|
||||
|
||||
## Read
|
||||
|
||||
R12's < 5 allocs/write target is **already met** across the proven matrix:
|
||||
|
||||
- Scalar writes (Bool, Int32, Float32, Float64) sit at 1–2 allocs/op.
|
||||
The two allocs come from (1) the encoder's `Vec<u8>` output buffer
|
||||
and (2) an internal scratch buffer in the value-encode path.
|
||||
- String writes hit 4 allocs/op (output buffer, UTF-16LE conversion
|
||||
buffer, the inner-length wrapper, and one more downstream).
|
||||
- `MxReferenceHandle::from_names` allocates twice (one per
|
||||
`compute_name_signature` call — UTF-16LE buffer for each name).
|
||||
- `NmxSubscriptionMessage::parse_inner` allocates once for the
|
||||
`records: Vec<NmxSubscriptionRecord>` collection.
|
||||
|
||||
## Implications for F39
|
||||
|
||||
F39 (zero-copy pass) was scoped as the work to *hit* the R12 target.
|
||||
With the target already met, F39's scope tightens to:
|
||||
|
||||
- Move the encoder's output buffer to `bytes::BytesMut` so consumers
|
||||
can split it without copying. Doesn't reduce alloc count but
|
||||
improves downstream zero-copy on the wire-write path.
|
||||
- Cache the per-handle UTF-16LE name conversion (the two
|
||||
`compute_name_signature` allocs per `from_names`) inside
|
||||
`MxReferenceHandle` if the same name is registered repeatedly.
|
||||
- Pool the per-frame scratch buffer at the session level so the
|
||||
per-write count drops from 2 → 1 on hot paths.
|
||||
|
||||
These are nice-to-have optimisations rather than R12 blockers.
|
||||
|
||||
## Reproducing
|
||||
|
||||
```powershell
|
||||
cd rust
|
||||
cargo bench -p mxaccess-codec
|
||||
```
|
||||
|
||||
Numbers are deterministic per release-profile build on a given host.
|
||||
Numeric drift across hosts is expected (the warm-up + black_box hints
|
||||
keep iteration counts stable, not the underlying allocator's
|
||||
small-alloc fast-path heuristics).
|
||||
@@ -15,5 +15,9 @@ thiserror = { workspace = true }
|
||||
default = []
|
||||
serde = []
|
||||
|
||||
[[bench]]
|
||||
name = "alloc_count"
|
||||
harness = false
|
||||
|
||||
[lints]
|
||||
workspace = true
|
||||
|
||||
@@ -0,0 +1,298 @@
|
||||
//! F38 — counting-allocator bench for `mxaccess-codec`.
|
||||
//!
|
||||
//! Measures allocation count + bytes-allocated for the proven
|
||||
//! encode/decode matrix per `design/70-risks-and-open-questions.md`
|
||||
//! R12 (< 5 allocs per write at steady state). The harness wraps the
|
||||
//! global allocator with a [`CountingAllocator`] that tracks
|
||||
//! per-call counts; each scenario records pre-state, runs N
|
||||
//! iterations, and reports `(alloc_count, bytes_allocated) / N`.
|
||||
//!
|
||||
//! Output is the source of truth for `design/M6-bench-baseline.md`.
|
||||
//!
|
||||
//! ## Why hand-rolled (not `dhat` / `criterion`)
|
||||
//!
|
||||
//! - `dhat` is heap-profiling oriented (snapshots, call-stack
|
||||
//! attribution); for "did this op allocate < 5 times?" the simpler
|
||||
//! approach is a thin `GlobalAlloc` wrapper that increments two
|
||||
//! atomics. No call-stack capture, no JSON output to post-process.
|
||||
//! - `criterion` measures wall-clock latency; per `60-roadmap.md:104`,
|
||||
//! latency is reported but not gating in V1. Allocation count IS
|
||||
//! the gating metric for M6 DoD bullet 3.
|
||||
//!
|
||||
//! ## Run
|
||||
//!
|
||||
//! ```text
|
||||
//! cargo bench -p mxaccess-codec
|
||||
//! ```
|
||||
//!
|
||||
//! Each scenario runs in release mode by default (cargo bench
|
||||
//! profile = `bench` which inherits release).
|
||||
|
||||
#![allow(
|
||||
clippy::unwrap_used,
|
||||
clippy::expect_used,
|
||||
clippy::cast_possible_truncation,
|
||||
clippy::cast_sign_loss
|
||||
)]
|
||||
|
||||
use std::alloc::{GlobalAlloc, Layout, System};
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
|
||||
use mxaccess_codec::{
|
||||
MxReferenceHandle, NmxSubscriptionMessage, write_message,
|
||||
write_message::WriteValue,
|
||||
};
|
||||
|
||||
// ---- counting allocator -------------------------------------------------
|
||||
|
||||
struct CountingAllocator;
|
||||
|
||||
static ALLOC_COUNT: AtomicU64 = AtomicU64::new(0);
|
||||
static ALLOC_BYTES: AtomicU64 = AtomicU64::new(0);
|
||||
static DEALLOC_COUNT: AtomicU64 = AtomicU64::new(0);
|
||||
|
||||
unsafe impl GlobalAlloc for CountingAllocator {
|
||||
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
|
||||
ALLOC_COUNT.fetch_add(1, Ordering::Relaxed);
|
||||
ALLOC_BYTES.fetch_add(layout.size() as u64, Ordering::Relaxed);
|
||||
// SAFETY: forwarding to the system allocator with the same layout.
|
||||
unsafe { System.alloc(layout) }
|
||||
}
|
||||
|
||||
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
|
||||
DEALLOC_COUNT.fetch_add(1, Ordering::Relaxed);
|
||||
// SAFETY: forwarding to the system allocator with the same ptr+layout.
|
||||
unsafe { System.dealloc(ptr, layout) }
|
||||
}
|
||||
}
|
||||
|
||||
#[global_allocator]
|
||||
static GLOBAL: CountingAllocator = CountingAllocator;
|
||||
|
||||
// ---- scenario harness ---------------------------------------------------
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
struct Snapshot {
|
||||
allocs: u64,
|
||||
bytes: u64,
|
||||
deallocs: u64,
|
||||
}
|
||||
|
||||
fn snapshot() -> Snapshot {
|
||||
Snapshot {
|
||||
allocs: ALLOC_COUNT.load(Ordering::Relaxed),
|
||||
bytes: ALLOC_BYTES.load(Ordering::Relaxed),
|
||||
deallocs: DEALLOC_COUNT.load(Ordering::Relaxed),
|
||||
}
|
||||
}
|
||||
|
||||
fn diff(start: Snapshot, end: Snapshot, iterations: u64) -> (f64, f64, f64) {
|
||||
(
|
||||
(end.allocs - start.allocs) as f64 / iterations as f64,
|
||||
(end.bytes - start.bytes) as f64 / iterations as f64,
|
||||
(end.deallocs - start.deallocs) as f64 / iterations as f64,
|
||||
)
|
||||
}
|
||||
|
||||
/// Run `op` `iterations` times and return per-op alloc/bytes/dealloc
|
||||
/// counts. The hint is passed through `std::hint::black_box` to keep
|
||||
/// the compiler from optimising the work away.
|
||||
fn measure<F>(name: &str, iterations: u64, mut op: F) -> Row
|
||||
where
|
||||
F: FnMut(),
|
||||
{
|
||||
// Warm-up: 1k iterations to settle any one-time setup state.
|
||||
for _ in 0..1024 {
|
||||
op();
|
||||
}
|
||||
let start = snapshot();
|
||||
for _ in 0..iterations {
|
||||
op();
|
||||
}
|
||||
let end = snapshot();
|
||||
let (allocs, bytes, deallocs) = diff(start, end, iterations);
|
||||
Row {
|
||||
name: name.to_string(),
|
||||
iterations,
|
||||
allocs_per_op: allocs,
|
||||
bytes_per_op: bytes,
|
||||
deallocs_per_op: deallocs,
|
||||
}
|
||||
}
|
||||
|
||||
struct Row {
|
||||
name: String,
|
||||
iterations: u64,
|
||||
allocs_per_op: f64,
|
||||
bytes_per_op: f64,
|
||||
deallocs_per_op: f64,
|
||||
}
|
||||
|
||||
fn print_table(rows: &[Row]) {
|
||||
println!();
|
||||
println!(
|
||||
"| {:40} | {:>10} | {:>10} | {:>10} | {:>10} |",
|
||||
"scenario", "iters", "allocs/op", "bytes/op", "deallocs/op"
|
||||
);
|
||||
println!(
|
||||
"| {:40} | {:>10} | {:>10} | {:>10} | {:>10} |",
|
||||
"-".repeat(40),
|
||||
"-".repeat(10),
|
||||
"-".repeat(10),
|
||||
"-".repeat(10),
|
||||
"-".repeat(10)
|
||||
);
|
||||
for row in rows {
|
||||
println!(
|
||||
"| {:40} | {:>10} | {:>10.2} | {:>10.0} | {:>10.2} |",
|
||||
row.name, row.iterations, row.allocs_per_op, row.bytes_per_op, row.deallocs_per_op
|
||||
);
|
||||
}
|
||||
println!();
|
||||
}
|
||||
|
||||
// ---- scenarios ----------------------------------------------------------
|
||||
|
||||
fn make_handle() -> MxReferenceHandle {
|
||||
MxReferenceHandle::from_names(0, 1, 2, 3, "TestObject", 0, 1, 0, "TestAttr", false)
|
||||
.expect("handle")
|
||||
}
|
||||
|
||||
fn bench_write_int32() -> Row {
|
||||
let handle = make_handle();
|
||||
let value = WriteValue::Int32(42);
|
||||
measure("write_message::encode (Int32)", 10_000, || {
|
||||
let bytes = write_message::encode(&handle, &value, 0, 0).unwrap();
|
||||
std::hint::black_box(bytes);
|
||||
})
|
||||
}
|
||||
|
||||
fn bench_write_float() -> Row {
|
||||
let handle = make_handle();
|
||||
let value = WriteValue::Float32(1.5);
|
||||
measure("write_message::encode (Float32)", 10_000, || {
|
||||
let bytes = write_message::encode(&handle, &value, 0, 0).unwrap();
|
||||
std::hint::black_box(bytes);
|
||||
})
|
||||
}
|
||||
|
||||
fn bench_write_double() -> Row {
|
||||
let handle = make_handle();
|
||||
let value = WriteValue::Float64(3.25);
|
||||
measure("write_message::encode (Float64)", 10_000, || {
|
||||
let bytes = write_message::encode(&handle, &value, 0, 0).unwrap();
|
||||
std::hint::black_box(bytes);
|
||||
})
|
||||
}
|
||||
|
||||
fn bench_write_bool() -> Row {
|
||||
let handle = make_handle();
|
||||
let value = WriteValue::Boolean(true);
|
||||
measure("write_message::encode (Boolean)", 10_000, || {
|
||||
let bytes = write_message::encode(&handle, &value, 0, 0).unwrap();
|
||||
std::hint::black_box(bytes);
|
||||
})
|
||||
}
|
||||
|
||||
fn bench_write_string() -> Row {
|
||||
let handle = make_handle();
|
||||
let value = WriteValue::String("hello".to_string());
|
||||
measure("write_message::encode (String, 5 chars)", 10_000, || {
|
||||
let bytes = write_message::encode(&handle, &value, 0, 0).unwrap();
|
||||
std::hint::black_box(bytes);
|
||||
})
|
||||
}
|
||||
|
||||
fn bench_subscription_decode() -> Row {
|
||||
// Build a single-record DataUpdate body once; decode N times.
|
||||
let body = build_data_update_int32_body(42);
|
||||
measure(
|
||||
"NmxSubscriptionMessage::parse_inner (DataUpdate, Int32)",
|
||||
10_000,
|
||||
|| {
|
||||
let msg = NmxSubscriptionMessage::parse_inner(&body).unwrap();
|
||||
std::hint::black_box(msg);
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
fn bench_handle_from_names() -> Row {
|
||||
measure("MxReferenceHandle::from_names", 10_000, || {
|
||||
let h = MxReferenceHandle::from_names(
|
||||
0,
|
||||
1,
|
||||
2,
|
||||
3,
|
||||
"TestChildObject",
|
||||
0,
|
||||
1,
|
||||
0,
|
||||
"TestInt",
|
||||
false,
|
||||
)
|
||||
.unwrap();
|
||||
std::hint::black_box(h);
|
||||
})
|
||||
}
|
||||
|
||||
// ---- helpers (mirror the test fixtures in subscription_message.rs) -----
|
||||
|
||||
fn build_data_update_int32_body(value: i32) -> Vec<u8> {
|
||||
// Operation id + correlation id are arbitrary 16-byte sequences for
|
||||
// a synthetic body; the codec doesn't reject any GUID shape.
|
||||
const DATA_UPDATE_COMMAND: u8 = 0x33;
|
||||
let operation_id = [0x11u8; 16];
|
||||
// 15-byte record prefix: status(4) + quality(2) + filetime(8) + wire_kind(1).
|
||||
// wire_kind = 0x02 = Int32. Then the 4-byte i32 LE payload.
|
||||
let mut record = Vec::with_capacity(15 + 4);
|
||||
record.extend_from_slice(&0i32.to_le_bytes()); // status
|
||||
record.extend_from_slice(&0x00C0u16.to_le_bytes()); // quality
|
||||
record.extend_from_slice(&0i64.to_le_bytes()); // filetime
|
||||
record.push(0x02); // wire_kind = Int32
|
||||
record.extend_from_slice(&value.to_le_bytes());
|
||||
|
||||
let mut out = Vec::with_capacity(23 + record.len());
|
||||
out.push(DATA_UPDATE_COMMAND);
|
||||
out.extend_from_slice(&1u16.to_le_bytes()); // version
|
||||
out.extend_from_slice(&1i32.to_le_bytes()); // record_count = 1
|
||||
out.extend_from_slice(&operation_id);
|
||||
out.extend_from_slice(&record);
|
||||
out
|
||||
}
|
||||
|
||||
// ---- main --------------------------------------------------------------
|
||||
|
||||
fn main() {
|
||||
println!("F38 — mxaccess-codec allocation-count baseline");
|
||||
println!("Counting allocator: thin GlobalAlloc wrapper around System.");
|
||||
println!("R12 target: < 5 allocations per write at steady state.");
|
||||
|
||||
let rows = vec![
|
||||
bench_write_int32(),
|
||||
bench_write_float(),
|
||||
bench_write_double(),
|
||||
bench_write_bool(),
|
||||
bench_write_string(),
|
||||
bench_handle_from_names(),
|
||||
bench_subscription_decode(),
|
||||
];
|
||||
|
||||
print_table(&rows);
|
||||
|
||||
// R12 gate: emit a non-zero exit code if any encode-write scenario
|
||||
// exceeds the 5-allocs threshold. Decoders are reported but not
|
||||
// gated (the sweep below explicitly excludes them).
|
||||
let mut violations = 0;
|
||||
for row in &rows {
|
||||
if row.name.starts_with("write_message::encode") && row.allocs_per_op >= 5.0 {
|
||||
eprintln!(
|
||||
"R12 violation: {} allocates {:.2}/op (>= 5)",
|
||||
row.name, row.allocs_per_op
|
||||
);
|
||||
violations += 1;
|
||||
}
|
||||
}
|
||||
if violations > 0 {
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user