master
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
ceeaeefa71 |
[F52.3] mxaccess-codec: caller-supplied scratch buffer for write encoder
Adds `write_message::encode_into_bytes_mut` (and the timestamped
variant) which writes the encoded body into a caller-supplied
`BytesMut`. The buffer is cleared and resized in place each call;
once it has grown to the largest body the session will produce, it
allocates nothing further.
A session that holds a single `BytesMut` and reuses it across writes:
- Int32 / Float32 / Float64: 2 → 1 allocs/op
(only the `encode_scalar_value` scratch `Vec<u8>` remains)
- Boolean: 1 → 0 allocs/op
(no per-value scratch — the literal payload is a stack `[u8; 4]`)
Bench delta in `design/M6-bench-baseline.md` § F52.3. The
`encode_scalar_value` Vec is the remaining 1 alloc/op for fixed-width
scalars; eliminating it would require inlining the LE-bytes write
into the body slice (left for a follow-up since the F52 spec only
asks for 2 → 1).
Resolves F52 (all three optimisations landed:
|
||
|
|
4e76b44391 |
[F52.1] mxaccess-codec: BytesMut output buffer for write encoder
Adds `write_message::encode_to_bytes_mut` (and the timestamped variant) returning a freshly-allocated `BytesMut`. Allocation count is identical to `encode` (2 allocs/op for fixed-width scalars); the benefit is downstream — consumers can `BytesMut::split_to` / `freeze` and forward the body bytes to a wire-level sink without an intermediate copy. The body builders (`encode_boolean` / `encode_fixed` / `encode_variable` / `encode_array`) were refactored to fill a pre-sized `&mut [u8]` rather than each allocating their own `Vec<u8>`. The dispatcher computes the body size up front via small `*_body_size` helpers and resizes the destination buffer (Vec or BytesMut) once. This is also the prerequisite refactor for F52.3. Bench delta in `design/M6-bench-baseline.md` § F52.1; existing `encode` row unchanged at 2 allocs/op. All 265 round-trip tests unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c73a33edd8 |
[R3/R4 Path A] mxaccess: port Lmx.dll FUN_10100ce0 synthesizer kernel
Path A landed for R3/R4. The byte->MxStatus synthesizer in Lmx.dll is
FUN_10100ce0 (`analysis/ghidra/exports/Lmx.dll.synthesizer-helpers2-decompile.md`),
a 4-byte u32 LE -> 4-tuple MxStatus decoder used by every NMX-frame
parser in Lmx.dll. The kernel is byte-deterministic and context-free,
so it ports as a pure function -- the operation-tracking state
machine the original verdict deferred is NOT required for synthesis.
Bit layout (per FUN_10100ce0 lines 21-24):
bit 31: success (-1 if set, 0 if clear)
bits 27..24: category (4 bits)
bits 23..20: detected_by (4 bits)
bits 15..0: detail (i16 -- low 16 bits, signed)
bits 30..28, 19..16: reserved/padding
Codec changes:
- MxStatus::from_packed_u32() / ::to_packed_u32() -- the kernel +
inverse for round-trip parity.
- MxStatus::from_nmx_response_code() -- the constructed-from-response-
code switch in FUN_1010bd10:741-770 (six proven mappings: 0x01, 0x02
-> CommunicationError + RequestingNmx; 0x03 -> ConfigurationError +
RequestingNmx; 0x04 -> ConfigurationError + RespondingNmx; 0x05 ->
CommunicationError + RespondingNmx; 0x1A -> CommunicationError +
RequestingNmx).
- MxStatusCategory / MxStatusSource: from_i16/to_i16 promoted to const
fn so MxStatus::from_packed_u32 can be const.
- NmxOperationStatusMessage::try_parse_process_data_received_body() --
thin wrapper that peels the outer NmxObservedEnvelope before
delegating to try_parse_inner. Mirrors
NmxOperationStatusMessage.TryParseProcessDataReceivedBody (.NET cs:20-32).
- NmxOperationStatusMessage::promote_to_typed() -- entry point that
returns the existing Status field. Documented as a no-op pass-through
for now (the 5-byte inner-body wire shape is NOT the same field as
the 4-byte packed-u32 the kernel decodes); kept for API symmetry.
- 22 new round-trip tests covering the kernel, the response-code
switch, the proven 0x00/0x41/0xEF completion bytes, and round-trip
for every canonical sentinel.
mxaccess (Session) changes:
- New OperationKind enum (Write/WriteSecured/Read/Subscribe/
Unsubscribe/Activate/Suspend/Other).
- New OperationContext struct (correlation_id, op_kind, reference,
retry_count) -- ground for the F54 follow-on per-operation
correlation work.
- New OperationStatus event type {raw, status, context,
is_during_recovery}, mirroring MxNativeOperationStatusEvent (cs:73-78)
with the typed-MxStatus addition.
- Session::operation_status_events() -> broadcast::Receiver<Arc<
OperationStatus>> + operation_status_stream() Stream variant.
- callback_router() now tries operation-status parsing first, falling
through to subscription messages -- matches MxNativeSession
.OnCallbackReceived dispatch order (cs:574,582,590).
- recover_connection() flips a recovery_active counter (Arc<AtomicU32>
shared with the router) so OperationStatus.is_during_recovery is
populated correctly. Mirrors MxNativeSession._recoveryActive
Volatile.Read at cs:573.
- 3 new router tests covering: status-word frame dispatch + typed
promotion to WriteCompleteOk; completion-only frames stay verbatim;
is_during_recovery is stamped from the live counter.
Per-operation context tracking (correlating completion frames back to
outstanding writes/subscribes via the correlation_id) is filed as F54
in design/followups.md. The synthesizer kernel itself is byte-
deterministic, so the kernel and the correlation work are decoupled.
Ghidra evidence (the next-ring xref walk beyond FUN_10114a90):
- analysis/ghidra/exports/Lmx.dll.set-attribute-result-xrefs.md --
xrefs to OnSetAttributeResult / CancelWithStatus / OperationComplete.
- analysis/ghidra/exports/Lmx.dll.vtable-data-xrefs.md -- vtable-slot
data xrefs for the virtual-dispatch path.
- analysis/ghidra/exports/Lmx.dll.synthesizer-decompile.md --
ScanOnDemandCallback::OperationComplete/MultipleOperationComplete
(FUN_1010b990), RemotePlatformResolver::OperationComplete
(FUN_1010dc80), and the constructed-from-responseCode synthesizer
in FUN_1010bd10 (lines 698-770). FUN_1010bd10 is the wire-frame
receiver that drives the synthesis.
- analysis/ghidra/exports/Lmx.dll.synthesizer-helpers-decompile.md --
FUN_10003fc0 (the <success %d category %d ...> formatter; confirms
the 4-tuple layout), FUN_1008f150 (dispatch helper).
- analysis/ghidra/exports/Lmx.dll.synthesizer-helpers2-decompile.md --
FUN_10100ce0 (the kernel itself), FUN_10100bc0 (3xu16 reader),
FUN_1005e580 (4-byte stream reader), FUN_1010ee00 (sister NMX-frame
parser using the same kernel).
- analysis/ghidra/exports/Lmx.dll.synthesizer-callers-xrefs.md --
caller graph; confirms the kernel is called from many wire-frame
parsers but each parser shares the single 4-byte decoder.
R3/R4 verdict updated in design/70-risks-and-open-questions.md from
"settled at verbatim-preserve" to "settled per Path A". F54 filed in
design/followups.md for the per-operation correlation work.
cargo build / test / clippy -D warnings / RUSTDOCFLAGS=-D warnings doc
all clean. cargo public-api baselines regenerated for mxaccess and
mxaccess-codec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
71c69b80c6 |
[F38] mxaccess-codec: counting-allocator bench harness + R12 baseline
Hand-rolled GlobalAlloc wrapper around System that tracks allocs +
bytes + deallocs via two atomics. Each scenario runs 10k iterations
after a 1k warm-up; output is a markdown table with allocs/op,
bytes/op, deallocs/op.
Why hand-rolled (not dhat/criterion): R12 gates on a single number
("< 5 allocs/write"). dhat is heap-profiling-oriented (call-stack
attribution, JSON snapshots); criterion measures wall-clock latency
which is reported-but-not-gated per 60-roadmap.md:104. A 50-line
GlobalAlloc + atomic counters is the simplest thing that answers
the gate.
Run: `cargo bench -p mxaccess-codec`
Baseline numbers (release, Windows x64):
- Bool write: 1.00 allocs/op
- Int32 write: 2.00 allocs/op
- Float32 write: 2.00 allocs/op
- Float64 write: 2.00 allocs/op
- String write: 4.00 allocs/op (5-char string)
- Handle from_names: 2.00 allocs/op
- DataUpdate decode: 1.00 alloc/op
R12's < 5 allocs/write target is **already met** across the proven
matrix without any zero-copy work. The bench gates on this — any
write_message::encode scenario at >= 5 allocs/op exits the harness
with code 1.
Companion: `design/M6-bench-baseline.md` documents the numbers,
explains the per-scenario breakdown, and tightens F39's scope from
"hit the target" to "nice-to-have optimisations" (BytesMut output
buffer, name-signature cache, session-level scratch pool).
Workspace: 759 tests still pass; clippy --benches clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|