[F49 step 1 + F56] callback router: peel envelope before parsing subscription / 0x11 frames

The router used to call NmxSubscriptionMessage::parse_inner directly
on the COM-stub-delivered body, but the wire bytes arrive wrapped in
a ProcessDataReceived envelope (46-byte header + optional 4-byte
length prefix); parse_inner expects post-envelope bytes. Result:
every 0x33 DataUpdate that ever arrived was silently dropped.

Mirrors the .NET reference's MxNativeSession.OnCallbackReceived flow
at cs:582-606 — three sequential parse attempts:
  1. NmxOperationStatusMessage::try_parse_process_data_received_body  (already wired)
  2. NmxReferenceRegistrationResultMessage::try_parse_...              (NEW — was missing)
  3. NmxSubscriptionMessage::try_parse_process_data_received_body      (NEW — was wrong)

Adds:
- NmxSubscriptionMessage::try_parse_process_data_received_body — peels
  envelope via NmxObservedEnvelope::parse_process_data_received_body_flexible,
  then dispatches to existing parse_inner.
- NmxReferenceRegistrationResultMessage::try_parse_process_data_received_body —
  same shape, for the 0x11 registration-result frame.
- Router branch for 0x11 — currently traces the assigned item_handle and
  drops the frame (matches the .NET reference, which fires a
  ReferenceRegistrationReceived event with no consumer in the codebase).
- Router fall-through trace! when neither path matches, so future
  unparseable bodies surface in RUST_LOG=trace instead of vanishing.
- DcomCallbackSink::forward — trace! per inbound callback so
  RUST_LOG=mxaccess_callback=trace surfaces opnum + size.
- crates/mxaccess-compat/tests/buffered_subscribe_live.rs — F49 step 1
  live test that drives subscribe_buffered + a 500ms-cadence writer.
  Also pulls tracing-subscriber as a dev-dep so the test can dump
  router activity.

Existing router_task_decodes_callback_invoked_into_broadcast unit test
updated to wrap its synthetic 0x32 body in an envelope so the new
parse path actually accepts it.

Live result: F56 — the buffered round-trip *registers* successfully
(RegisterReference returns HRESULT 0; engine sends one 0x11
RegistrationResult + one 51-byte op-status per write, perfectly
clocked) but the engine never sends a 0x33 DataUpdate. Rust-port-
specific gap vs the .NET reference's working buffered path; root
cause is likely a field-level difference in the RegisterReference
body or a missing post-RegisterReference step. Captured as F56 in
design/followups.md, blocking F49 step 1; F56's DoD is the same
live test reporting >=3 DataChange arrivals.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-06 09:50:57 -04:00
parent 2fc327a8d5
commit af15fe7587
8 changed files with 477 additions and 24 deletions
+25
View File
@@ -41,6 +41,8 @@ Between each publish: wait for the crate to be indexed before the next one's `ca
**Step 5 unblocked 2026-05-06 by F55 / Path A.** `cargo test -p mxaccess-compat --features live-windows-com --test lmx_write_complete_live -- --ignored --nocapture` passes against the live AVEVA install: RegisterEngine2 OK, write round-trips, OnWriteComplete fires with the expected `WriteCompleteEvent { server_handle, item_handle, statuses, is_during_recovery }` shape. Steps 1-4 still pending.
**Step 1 attempted 2026-05-06 — blocked by F56.** Added `crates/mxaccess-compat/tests/buffered_subscribe_live.rs` driving `Session::subscribe_buffered` via `Session::connect_nmx_auto`. RegisterReference completes successfully against the live engine, but no `0x33` DataUpdate frames are ever received — only op-status frames per write and the `0x11` registration-result. The Rust port's wire frame for RegisterReference must differ from the .NET reference's in some field. The codec router was also fixed during this attempt (envelope-peeling for `NmxSubscriptionMessage` and the `0x11` `NmxReferenceRegistrationResultMessage` path were both missing); those are real bugs that would have hidden any DataUpdate had one arrived. F56 captures the open work.
**Definition of done:**
1. Per-feature evidence summary in `docs/M6-live-verification.md` (one paragraph per feature with the wire-trace excerpt or metrics-exporter snapshot).
2. If any feature fails live: file a sub-followup with the captured failure and link it from the evidence doc.
@@ -100,6 +102,29 @@ Between each publish: wait for the crate to be indexed before the next one's `ca
**Resolves when:** the lint is on and the workspace doc build is warning-clean with it.
### F56 — Buffered subscribe completes RegisterReference but receives no `0x33` DataUpdate frames
**Severity:** P1 — blocks F49 step 1 (F36 buffered live verification) and any consumer relying on `Session::subscribe_buffered` to surface value changes.
**Source:** F49 step 1 live attempt 2026-05-06. Test `cargo test -p mxaccess-compat --features live-windows-com --test buffered_subscribe_live -- --ignored --nocapture` (added in this session) connects via `Session::connect_nmx_auto` (F55-proven), issues `subscribe_buffered(TestChildObject.TestInt, 1000ms)` against the live engine, and runs a background writer at 500ms cadence. RegisterReference returns HRESULT 0; the engine then fires:
- One 46-byte heartbeat envelope (header-only, empty inner)
- One 51-byte op-status frame for the `RegisterReference` completion
- One 87-byte `0x11` `NmxReferenceRegistrationResultMessage` carrying the assigned `item_handle`
- One 51-byte op-status frame **per write** (60 frames over 30s — perfectly clocked to the writer cadence)
But **zero `0x33` `DataUpdate` frames** ever arrive — verified end-to-end via `RUST_LOG=trace mxaccess_callback=trace`. The .NET reference's `MxNativeSession.SubscribeBufferedAsync` does deliver DataUpdates against the same engine + same tag (per F36 wave 1 evidence at `captures/094-frida-buffered-separate-writer/`), so this is a Rust-port-specific gap.
**Likely causes (in priority order):**
1. The `NmxReferenceRegistrationMessage` body the Rust port sends differs in some field from the .NET reference's. Specifically: `subscribe: true` is set, but other fields (e.g. `item_handle = 0`, `reserved_*`, `source_galaxy_id`) may need different values to trigger DataUpdate dispatch. **Action**: capture the wire bytes from the Rust port's RegisterReference and diff against `captures/094-frida-buffered-separate-writer/` per-byte.
2. Some additional client-side step is required after RegisterReference — e.g. an ACK of the `0x11` registration result via the assigned `item_handle`, or a separate RPC the .NET reference dispatches that we miss. The F36 wave 1 evidence said no `SetBufferedUpdateInterval` is sent, but there may be another op. **Action**: capture .NET reference's outbound calls during `subscribe-buffered` end-to-end and compare to ours.
3. The `0x11` registration-result body might carry a status code we should be checking (see `NmxReferenceRegistrationResultMessage::status_category` / `status_detail`). If non-zero, the engine may have rejected the subscription silently. **Action**: log the parsed `0x11` body and check the status fields.
**What's already wired (this session):** `NmxSubscriptionMessage::try_parse_process_data_received_body` (envelope-peeling helper) was added — the previous router called `parse_inner` directly on wire bytes and would have silently dropped any `0x33` that did arrive. This was a real bug fix; without it F56 would have stayed invisible. Same for `NmxReferenceRegistrationResultMessage::try_parse_process_data_received_body` + the `0x11` path in the router.
**Does not affect:** `Session::write` round-trip (proven by F55 live test); plain `Session::subscribe` (not yet live-tested but uses `AdviseSupervisory` not `RegisterReference`).
**Definition of done:** F49 step 1 passes — `cargo test -p mxaccess-compat --features live-windows-com --test buffered_subscribe_live -- --ignored --nocapture` reports at least 3 `DataChange` arrivals at the configured cadence, with monotonically-increasing values matching the writer.
**Resolves when:** the missing field / step / status check is identified, the fix lands in `Session::subscribe_buffered_nmx` (or upstream), and the live test passes.
### F55 — Hand-rolled callback exporter rejected by `RegisterEngine2` on this AVEVA install
**Status:** Resolved 2026-05-06 by Path A (DCOM-managed `INmxSvcCallback` sink in `mxaccess-callback::dcom_sink`, wired into `Session::from_nmx_client` behind the `windows-com` feature). Live test `cargo test -p mxaccess-compat --features live-windows-com --test lmx_write_complete_live -- --ignored --nocapture` passes end-to-end: RegisterEngine2 succeeds, write round-trips, OnWriteComplete fires with status from the wire. The hand-rolled `CallbackExporter` is retained for unit tests that exercise the exporter against an in-process fake NMX peer.
**Severity:** P1 — blocks F49 live verification of every M6 feature that needs an `Engine` registered (i.e. all of them).