[F56] subscribe / subscribe_buffered: split-form wire body + diagnose Galaxy fixture gap
Three real fixes + one architectural diagnosis:
1. Session::subscribe_buffered_nmx now sends the .NET-reference split
form on the wire:
item_definition = "<attr>.property(buffer)" (was: full reference)
item_context = "<object_tag_name>" (was: empty)
item_handle = SessionInner::next_item_handle.fetch_add(1)
(was: hardcoded 0)
Verified byte-identical against captures/082 + 094 by the existing
buffered_register_reference_parity unit tests. The
item_handle counter mirrors MxNativeCompatibilityServer's
_nextItemHandle++ at MxNativeSession.cs:613.
2. New live tests:
- tests/buffered_subscribe_live.rs (F49 step 1) — uses real Galaxy
metadata via SqlTagResolver + connect_nmx_auto, drives a
background writer at 500ms cadence to force value-changes,
drains DataChange events from Subscription.
- tests/plain_subscribe_live.rs — same harness over plain
Session::subscribe (NOT buffered), used to isolate whether
"no DataUpdate" is buffered-specific (it's not — both fail).
Both pull tracing-subscriber as a dev-dep so `RUST_LOG=trace`
surfaces dcom_sink + router activity.
3. mxaccess-galaxy/sql_resolver.rs: drop the inner-attribute
`#![cfg(feature = "galaxy-resolver")]` — the module-level cfg on
`pub mod sql_resolver` in lib.rs already handles this and Rust
1.85's clippy::duplicated_attributes lint flagged the duplicate
once mxaccess-compat dev-deps activated the feature.
4. F56 finding (diagnosis, NOT a bug fix): the engine on this Galaxy
install does not have an active value for TestChildObject.TestInt.
Confirmed by running the .NET reference's own probe:
dotnet run --project src/MxNativeClient.Probe -c Release \
-- --probe-session-subscribe --tag=TestChildObject.TestInt \
--subscribe-hold-seconds=10
...returns ONE 0x32 SubscriptionStatus (status=3 detail=3
quality=0x00C0 Uncertain value=null) and zero 0x33 DataUpdates —
matching the Rust port's symptom exactly. Not a Rust port bug,
not a wire-byte gap. F49 steps 1-3 need either an actively-
scanned tag or local Galaxy reconfiguration to scan
TestChildObject.TestInt.
Workspace tests + clippy clean under both feature configurations.
F56 entry in design/followups.md updated with the full diagnostic
chain so future-me / future-collaborators can pick it up without
re-tracing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+27
-2
@@ -102,8 +102,33 @@ Between each publish: wait for the crate to be indexed before the next one's `ca
|
||||
|
||||
**Resolves when:** the lint is on and the workspace doc build is warning-clean with it.
|
||||
|
||||
### F56 — Buffered subscribe completes RegisterReference but receives no `0x33` DataUpdate frames
|
||||
**Severity:** P1 — blocks F49 step 1 (F36 buffered live verification) and any consumer relying on `Session::subscribe_buffered` to surface value changes.
|
||||
### F56 — `subscribe` / `subscribe_buffered` complete on the wire but never receive `0x33` DataUpdate frames
|
||||
**Status:** Diagnosed 2026-05-06 as a **test-fixture issue, not a Rust port bug**. The .NET reference's own `MxNativeClient.Probe --probe-session-subscribe --tag=TestChildObject.TestInt` returns a single `0x32` SubscriptionStatus with `status=3 detail=3 quality=0x00C0 (Uncertain) value=null` and zero `0x33` DataUpdates — same observation as the Rust port's `subscribe` / `subscribe_buffered` paths. The engine on this Galaxy install does not have a live value for `TestChildObject.TestInt`; nothing is scanning that attribute, so there are no value-changes for the engine to dispatch. F49 steps 1-3 need either (a) a different test tag with active scanning, or (b) configuring the local Galaxy to scan TestChildObject.TestInt before live verification can pass.
|
||||
|
||||
Real codec fixes still landed in this session (envelope-peeling for `NmxSubscriptionMessage` + `0x11` registration-result path + split-form RegisterReference body + per-session item_handle counter); they were necessary preconditions for F49 step 1 even if the test fixture blocks the actual pass criterion.
|
||||
|
||||
**Severity:** P1 — blocks F49 step 1 (F36 buffered live verification), F49 step 2 (F45 recovery replay), and ALL consumers relying on subscription data flow on this Galaxy.
|
||||
|
||||
**Updated 2026-05-06.** Initial diagnosis suspected a buffered-specific wire-body gap; ruled out:
|
||||
- Wire body proven byte-identical to the .NET reference's by `crates/mxaccess-codec/tests/buffered_register_reference_parity.rs` (which forward-builds the message from `Session::subscribe_buffered`'s inputs and compares against `captures/082-frida-add-buffered-plain-advise-testint/`).
|
||||
- Test now uses real Galaxy DB metadata via `mxaccess_galaxy::SqlTagResolver` (engine_id=2, attribute_id=155, etc.) instead of the hardcoded StaticResolver shim.
|
||||
- Item-handle, item_definition, item_context all switched to the .NET-reference split form (handle=1 + per-session counter, definition="<attr>.property(buffer)", context="<object_tag>").
|
||||
|
||||
**Plain subscribe also fails.** Added `crates/mxaccess-compat/tests/plain_subscribe_live.rs` driving `Session::subscribe` (NOT buffered). Same symptom: `AdviseSupervisory` returns HRESULT 0, the engine acks every write with a 51-byte op-status frame, but no `0x33` DataUpdate ever arrives. So this is **not buffered-specific** — the entire inbound DataUpdate path is silent on this machine.
|
||||
|
||||
**Likely revised root cause:**
|
||||
- The engine generates `0x33` DataUpdate frames into a different transport channel than the one our DCOM sink listens on. The .NET reference's `INmxSvcCallback` has two opnums — `DataReceivedRaw` (3) and `StatusReceivedRaw` (4). We only ever observe opnum=3 callbacks. If the engine routes data updates through a different IID or different DCOM stub on this install, our sink never sees them.
|
||||
- Alternatively, the engine on this Galaxy install is configured such that local Object scanning is disabled / the deployed objects aren't actively producing value-change events. The `OnWriteComplete` round-trip works (proves write-path + callback-path); a passive subscription doesn't produce updates if no source changes the value.
|
||||
|
||||
**Action items (for whoever picks F56 up):**
|
||||
1. Compare the **C# DcomCallbackSink** (`src/MxNativeClient/NmxCallbackSink.cs`) to the Rust port's `mxaccess-callback::dcom_sink` — verify it implements **only** `INmxSvcCallback` and that the IID + vtable layout match. There may be a third method or a sibling interface (e.g. `INmxSvcCallback2`) that the engine also calls into for high-cadence DataUpdate dispatch.
|
||||
2. Try the same live test against a tag that has known active scanning (e.g. a bound-to-PLC InputSource attribute) — rule out static-UDA hypothesis.
|
||||
3. Run `MxNativeClient.Probe --probe-session-subscribe --tag=TestChildObject.TestInt --subscribe-hold-seconds=30` (the .NET reference's working live probe) and confirm `0x33` DataUpdates fire on THIS machine. If they do, capture the wire bytes via Frida and diff against the Rust port's exact body.
|
||||
|
||||
**What landed in this session (real router/codec fixes, NOT F56-resolving):**
|
||||
- `NmxSubscriptionMessage::try_parse_process_data_received_body` — peels the `ProcessDataReceived` envelope before calling `parse_inner`. The router previously called `parse_inner` directly on wire bytes, which would have silently dropped any `0x33` even if one arrived.
|
||||
- `NmxReferenceRegistrationResultMessage::try_parse_process_data_received_body` + router branch — drops `0x11` registration-result frames cleanly instead of logging "unexpected opcode 0x11".
|
||||
- `Session::subscribe_buffered_nmx` — split-form (object, attribute) wire body + per-session monotonic `item_handle` counter (mirrors `MxNativeCompatibilityServer.AddBufferedItemAsync`'s `_nextItemHandle++`).
|
||||
**Source:** F49 step 1 live attempt 2026-05-06. Test `cargo test -p mxaccess-compat --features live-windows-com --test buffered_subscribe_live -- --ignored --nocapture` (added in this session) connects via `Session::connect_nmx_auto` (F55-proven), issues `subscribe_buffered(TestChildObject.TestInt, 1000ms)` against the live engine, and runs a background writer at 500ms cadence. RegisterReference returns HRESULT 0; the engine then fires:
|
||||
- One 46-byte heartbeat envelope (header-only, empty inner)
|
||||
- One 51-byte op-status frame for the `RegisterReference` completion
|
||||
|
||||
Reference in New Issue
Block a user