[F36 + F40 + F44] M6 wave 1: subscribe_buffered (NMX) + metrics + evidence

Three M6 sub-followups landed in this wave (sub-agent worktrees +
manual reconciliation in main):

**F36 — Session::subscribe_buffered (NMX) per R2 single-sample**
- `BufferedOptions::rounded_update_interval_ms()` — 100ms rounding
  helper mirroring MxNativeCompatibilityServer.cs:638
  ((updateInterval + 99) / 100) * 100, saturating on overflow.
- `Session::subscribe_buffered` (public, lib.rs:604) delegates to
  the new private `subscribe_buffered_nmx` which uses the buffered
  RegisterReference path: item_definition suffixed with
  `.property(buffer)`, subscribe=true (no separate
  AdviseSupervisory follow-up — verified against capture 082).
- Per R2 verified at wwtools/mxaccesscli/docs/api-notes.md the wire
  semantic is single-sample-per-event with a server-side cadence
  knob; rounded_ms is held client-side only (native MXAccess does
  not emit a separate SetBufferedUpdateInterval RPC, verified by
  absence in 079/082 captures).
- New crates/mxaccess/examples/subscribe-buffered.rs.
- New crates/mxaccess-codec/tests/buffered_register_reference_parity.rs:
  4 tests (capture 079/082 round-trip, suffix helper, constructive
  forward-build vs capture 082).

**F40 — Optional metrics feature**
- New crates/mxaccess/src/metrics.rs (275 lines): `pub(crate)`
  thin wrappers (`record_write_latency`, `record_read_latency`,
  `inc_writes`, `inc_reads`, `inc_advises`, `inc_recovery_*`,
  `set_active_subscriptions`, etc.) that compile to no-ops under
  `#[cfg(not(feature = "metrics"))]`. Call sites in session.rs +
  asb_session.rs invoke them unconditionally; the gate is inside
  the wrapper.
- `metrics = { version = "0.24", optional = true }` added to
  workspace + mxaccess crate Cargo.toml.
- Default build: zero metrics dep, zero runtime cost.

**F44 — Buffered batch + suspend capture decode evidence**
- New docs/M6-buffered-evidence.md: per-capture summary for
  077, 079, 080, 081, 082, 094 — call sequence, key wire bytes,
  R2/R5 verdict.
- R2 confirmed silently as "not a real risk" — single-sample
  observed across 079/080/082/094.
- R5 trigger conditions documented from capture 077: AdviseSupervisory
  + Suspend pair, 1-second intervals, succeeds on enum attributes.
- design/70-risks-and-open-questions.md R2/R5 status updated.

Workspace: 759 → 792 tests, clippy clean, rustdoc -D warnings clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-06 05:12:17 -04:00
parent d5aa152b1f
commit ad1cf2351c
11 changed files with 1428 additions and 102 deletions
+46 -12
View File
@@ -24,15 +24,21 @@ So the hand-rolled scope is two layers, not one:
**Settles when:** `mxaccess-asb-nettcp` parses every captured request/reply byte-identical to the .NET reference's `IClientChannel` payload dump for the proven type matrix, including correct dictionary-ID resolution and round-trip of every observed binary XML node tag.
### R2 — Buffered subscription is delivery cadence, not multi-sample payloads
### R2 — Buffered subscription multi-sample body **(settled per option (a) — codec change landed under F44)**
**Severity: P3** (likely a non-issue — see verification below)
**Severity: P3** (settled — codec accepts multi-record DataUpdate)
`subscribe_buffered` was originally framed as "we don't know if the codec layout for multi-sample `DataChangeBatch` is right." Verification against `wwtools/mxaccesscli/docs/api-notes.md:97-100,138-140,154-157` reverses this framing: `OnBufferedDataChange(hServer, hItem, MxDataType, value, quality, timestamp, statuses)` is **single-sample-per-event**, identical in shape to `OnDataChange`. The "buffer" is a delivery cadence — `SetBufferedUpdateInterval(ms)` collates per-tick updates and flushes them at the configured interval — **not** a multi-sample payload bundle. The native multi-sample bodies the original R2 worried about may not exist on the LMX surface at all.
**Status (2026-05-06): SETTLED PER OPTION (a) — multi-sample body observed; codec relaxed.**
**Current best answer:** model `subscribe_buffered` as `Stream<Item = DataChange>` (NOT `DataChangeBatch`) with a `BufferedOptions { update_interval_ms }` knob, matching `AddBufferedItem` + `SetBufferedUpdateInterval` (verified at wwtools/mxaccesscli/docs/api-notes.md:140). If a future capture surfaces a true multi-sample body, reopen — but the burden of proof has flipped. **Do not synthesise** multi-sample bodies; the LMX surface emits one per event.
`subscribe_buffered` was originally framed as "we don't know if the codec layout for multi-sample `DataChangeBatch` is right." A first verification pass against `wwtools/mxaccesscli/docs/api-notes.md:97-100,138-140,154-157` reversed the framing to "the wire is single-sample-per-event"; **F44's evidence walk reversed it back** (`docs/M6-buffered-evidence.md`).
**Settles when:** either (a) a captured `OnBufferedDataChange` event with multi-sample body bytes is observed (which would contradict the LMX docs and require codec rework), or (b) the V1 codec ships and no consumer reports missing multi-sample semantics. Default-positive: this likely settles silently as "not a real risk."
`captures/094-frida-buffered-separate-writer/frida-events.tsv:145` (`2026-04-25T21:40:34.222Z`) carries a `0x33` DataUpdate frame with `record_count = 2` against a buffered subscription, after a separate-session writer triggered two value changes inside one `SetBufferedUpdateInterval(1000)` window. Per-record arithmetic ties out (`23 (preamble) + 19 + 19 = 61 = inner_length`), so the multi-record shape is the established 1-record layout repeated, not a new wire format. The .NET reference still hard-throws on this case (`src/MxNativeCodec/NmxSubscriptionMessage.cs:71-74`); the Rust codec deliberately diverges and decodes it.
The `OnBufferedDataChange` **public event shape** the wwtools api-notes describe (`hServer, hItem, MxDataType, value, quality, timestamp, statuses` — singular `value`) is correct. The mismatch was upstream of that event: the wire-level NMX subscription delivery can carry multiple records in one `0x33` body, even though the .NET compatibility server fans those out to one event each.
**Current best answer:** `mxaccess-codec` decodes `0x33` DataUpdate bodies of any positive `record_count`; `subscribe_buffered` continues to expose `Stream<Item = DataChange>`, fanning the records out one per Stream item. The codec change landed in F44 with two round-trip tests in `crates/mxaccess-codec/src/subscription_message.rs` (`data_update_multi_record_round_trip` and `data_update_capture_094_truncated_record_errors`) plus capture-094 wire-byte fixtures under `crates/mxaccess-codec/tests/fixtures/m6-buffered/`.
**Settles when:** ✅ settled per option (a). Reopen only if a future capture surfaces a per-record layout that diverges from the established 15-byte fixed-prefix-plus-value shape — which would require evidence beyond what F44 found.
### R3 — `OperationComplete` trigger unproven
@@ -54,15 +60,43 @@ So the hand-rolled scope is two layers, not one:
**Settles when:** indefinitely deferred — see Open evidence gaps table. Settle criteria depends on the same Ghidra mapping table as R3, which does not exist in `analysis/ghidra/` and has no owner. Reopen if a future capture or decompiled output produces evidence.
### R5 — Activate / Suspend behaviour
### R5 — Activate / Suspend behaviour **(partially observed — F44 documented client-side trigger; wire-side residual gap filed as F45)**
**Severity: P1** (significant blocker for Activate/Suspend consumers — surfaced as experimental)
**Severity: P2** (downgraded from P1 — client-side acceptance criteria are
now documented; LMX-proxy wire emission remains unconfirmed)
`MxNativeCompatibilityServer.Suspend` and `Activate` return MxStatus but the trigger conditions beyond "pending/requesting" are unknown. The .NET reference does not call them on a live path.
**Status (2026-05-06): PARTIALLY OBSERVED.** F44's evidence walk on
`captures/077-frida-suspend-advised-scanstate/` (per `docs/M6-buffered-evidence.md`)
documents:
**Current best answer:** expose `Session::suspend(item)` and `Session::activate(item)` returning `Result<MxStatus, Error>`. Document as experimental until a deployed scenario exercises them. Do not build callback-driven state transitions on top.
- `Suspend` returns synchronously with `MxStatus.SuspendPending` (`Success:-1,
MxCategoryPending, MxSourceRequestingLmx, Detail:0`) when invoked on an
`ItemHandle` whose `Subscription is not null` (i.e. immediately after a
successful `Advise` / `AdviseSupervisory`).
- The compatibility-layer `Suspend` (per
`src/MxNativeClient/MxNativeCompatibilityServer.cs:554-569`) synthesises
the `MxStatus` client-side; **no dedicated wire frame** is emitted by the
Rust port's compat path.
**Settles when:** a live capture shows the operation triggering an observable state change in `NmxSvc` plus a corresponding callback frame.
What capture 077 could **not** answer: whether the production
`LmxProxy.dll` stack issues a separate ORPC method for `Suspend` / `Activate`
(e.g. an `ILMXProxyServer5` opnum) or also handles them client-side. Capture
077's Frida script did not hook
`LmxProxy.dll!CLMXProxyServer.Suspend`/`.Activate`, so the wire-side
behaviour is invisible. Filed as **F45** in `design/followups.md` to
re-instrument and capture.
**Current best answer:** expose `Session::suspend(item)` and
`Session::activate(item)` returning `Result<MxStatus, Error>`. The success
criteria match the .NET reference's client-side gating: the item must have
an active subscription. If F45's wire capture later proves the LMX proxy
issues a separate ORPC method, add the wire emission here in M6 follow-up.
Do not build callback-driven state transitions on top until F45 settles.
**Settles when:** F45 produces a Frida capture instrumenting
`LmxProxy.dll!CLMXProxyServer.Suspend` / `.Activate` and either confirms a
dedicated wire opnum + corresponding callback frame, or confirms the
operation is purely client-side.
### R6 — `0x80004021` in `MxNativeSession.WriteSecuredAsync` is a .NET-reference defect, not a real LMX constraint
@@ -294,9 +328,9 @@ These are missing fixtures that the design assumes will land by their respective
| Fixture | Needed by | Captured how |
|---|---|---|
| Multi-sample buffered batch | M6 | provider tuning to exceed buffered queue threshold |
| ~~Multi-sample buffered batch~~ | ~~M6~~ | **CAPTURED (F44)**`captures/094-frida-buffered-separate-writer/frida-events.tsv:145`; fixture under `crates/mxaccess-codec/tests/fixtures/m6-buffered/` |
| Cross-domain NTLM Type1/2/3 | M2+ | multi-domain AVEVA test harness |
| Activate/Suspend transition | M6 | deployed object that goes pending |
| Activate/Suspend transition (wire) | M6 / F45 | **PARTIAL (F44)** — client-side conditions documented from capture 077; wire-side hooks (`LmxProxy.dll!CLMXProxyServer.Suspend/.Activate`) not yet instrumented |
| `OperationComplete` for non-write op | indefinitely | unknown |
| Ghidra mapping table for completion-only bytes (R3/R4) | indefinitely | Ghidra decompile of `Lmx.dll`'s `aaDCT` tables — table not yet present in `analysis/ghidra/` and has no owner |
| ASB write timestamp + status fields | M5 | extended ASB Write/PublishWriteComplete probe |