[F56 resolved] subscribe paths now drive 0x33 DataUpdate frames
Root cause: `Session::subscribe` and `Session::subscribe_buffered_nmx`
were missing the `INmxService2::Connect` + `AddSubscriberEngine` RPC
pair that the .NET reference's `MxNativeSession.EnsurePublisherConnected`
(`cs:516-526`) issues before the first advise against a publishing
engine. Without those two RPCs, NmxSvc accepted the subscription
registration but the publishing engine never knew our engine was
subscribed — so it never dispatched DataUpdate frames back.
Diagnosis driven by wwtools/aalogcli reading
C:\ProgramData\ArchestrA\LogFiles. The user pointed at this tooling
which lit up the path.
Red herring: NmxSvc's `[Warning] NmxCallback->DataReceived ... failed
with error 0x{N}` log lines turned out to be normal log spam where N
is the bufferSize of the inbound call, not a real error code. The
.NET reference's own probe triggers identical entries while still
receiving DataUpdate frames successfully.
Fix:
- SessionInner::publisher_endpoints — per-session HashMap<(platform_id,
engine_id), ()> cache mirroring MxNativeSession._publisherEndpoints.
- Session::ensure_publisher_connected — issues Connect +
AddSubscriberEngine, once per publisher endpoint per session.
- Session::subscribe + subscribe_buffered_nmx — both call it before
the wire advise.
- subscribe_buffered_nmx — additionally issues AdviseSupervisory after
RegisterReference. The .NET reference's RegisterBufferedItemAsync
only calls RegisterReference, but on this AVEVA install
RegisterReference alone produces the registration result + heartbeat
callbacks without ever starting DataUpdate dispatch; AdviseSupervisory
unblocks the dispatch.
Live verification (`TestMachine_001.TestChangingInt`, a tag that
updates >1×/s):
cargo test -p mxaccess-compat --features live-windows-com \
--test plain_subscribe_live -- --ignored --nocapture
cargo test -p mxaccess-compat --features live-windows-com \
--test buffered_subscribe_live -- --ignored --nocapture
Both pass — `cmd=0x32` SubscriptionStatus + sequence of `cmd=0x33`
DataUpdate frames flow as expected. Tests assert on the raw
Session::callbacks() broadcast (not the typed Subscription::next
DataChange path) because the engine reports quality=Uncertain
value=null for this attribute on this Galaxy — the wire-level
subscription is what F56 was about, not the value content.
DcomCallbackSink reverted to S_OK return for both DataReceivedRaw
and StatusReceivedRaw (the bytes-processed / sentinel HRESULT
experiments during diagnosis turned out to be irrelevant — the
"failed with error 0xN" logs come from NmxSvc regardless of the
return value).
design/followups.md F49 + F56 + docs/M6-live-verification.md updated:
F56 resolved, F49 steps 1 + 4 + 5 pass live, steps 2 + 3 pending
(now executable on this fixture).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+20
-3
@@ -26,7 +26,7 @@ Between each publish: wait for the crate to be indexed before the next one's `ca
|
||||
**Resolves when:** crates.io shows all 9 crates published + the V1 tag is pushed.
|
||||
|
||||
### F49 — Live verification sweep for the M6 features
|
||||
**Status:** Steps 4 + 5 resolved 2026-05-06 (`docs/M6-live-verification.md`); steps 1-3 carved out to **F56** (Galaxy fixture issue, not a Rust port bug — engine doesn't actively scan `TestChildObject.TestInt`, so no DataUpdate frames flow regardless of which client is asking; .NET reference's own probe sees the same null-quality `0x32` SubscriptionStatus). Step 1's blocker can be unblocked by switching the test fixture to a scanned attribute.
|
||||
**Status:** Steps 1, 4, 5 resolved 2026-05-06 (`docs/M6-live-verification.md`). F56 turned out to be a real Rust-port bug (missing `EnsurePublisherConnected` RPC pair) and was fixed; both `subscribe` and `subscribe_buffered` now drive `0x33` DataUpdate frames end-to-end against `TestMachine_001.TestChangingInt`. Steps 2 (F45 recovery replay live) and 3 (F47 buffered unsubscribe skip live) remain — they're now executable on this fixture but not yet run.
|
||||
**Severity:** P1 — closes the live-evidence gap for the M6 work that landed unit-only this session.
|
||||
**Source:** F36, F40, F45, F47, F54 closeouts — each ships with unit tests but most were not exercised against the live AVEVA install in this session.
|
||||
**Blocked-by:** F12 hardening (`Session::connect_nmx_auto` returns `RPC_S_SERVER_UNAVAILABLE` (1722) under `cargo test`'s tokio multi-thread runtime — see "Live attempt 2026-05-06" below). The COM-activation path itself works in isolation (`cargo run -p mxaccess-rpc --example com-marshal-probe --features windows-com` succeeds), so the failure is downstream — likely a COM apartment threading issue when CoInitializeEx runs on a tokio worker thread.
|
||||
@@ -104,9 +104,26 @@ Between each publish: wait for the crate to be indexed before the next one's `ca
|
||||
**Resolves when:** the lint is on and the workspace doc build is warning-clean with it.
|
||||
|
||||
### F56 — `subscribe` / `subscribe_buffered` complete on the wire but never receive `0x33` DataUpdate frames
|
||||
**Status:** Diagnosed 2026-05-06 as a **test-fixture issue, not a Rust port bug**. The .NET reference's own `MxNativeClient.Probe --probe-session-subscribe --tag=TestChildObject.TestInt` returns a single `0x32` SubscriptionStatus with `status=3 detail=3 quality=0x00C0 (Uncertain) value=null` and zero `0x33` DataUpdates — same observation as the Rust port's `subscribe` / `subscribe_buffered` paths. The engine on this Galaxy install does not have a live value for `TestChildObject.TestInt`; nothing is scanning that attribute, so there are no value-changes for the engine to dispatch. F49 steps 1-3 need either (a) a different test tag with active scanning, or (b) configuring the local Galaxy to scan TestChildObject.TestInt before live verification can pass.
|
||||
**Status:** **Resolved 2026-05-06.** Root cause: `Session::subscribe` and `Session::subscribe_buffered_nmx` were missing the `INmxService2::Connect` + `AddSubscriberEngine` round-trip that the .NET reference's `MxNativeSession.EnsurePublisherConnected` (`cs:516-526`) issues before the first advise against a given publishing engine. Without that pair of RPCs, NmxSvc accepts the subscription registration but the publishing engine never knows our engine is subscribed — so no `0x33` DataUpdate frames flow.
|
||||
|
||||
Real codec fixes still landed in this session (envelope-peeling for `NmxSubscriptionMessage` + `0x11` registration-result path + split-form RegisterReference body + per-session item_handle counter); they were necessary preconditions for F49 step 1 even if the test fixture blocks the actual pass criterion.
|
||||
Diagnosed via wwtools/aalogcli: the `[Warning] NmxSvc | NmxCallback->DataReceived ... failed with error 0x{N}` log lines turned out to be NmxSvc's normal log spam where N is the bufferSize, NOT an actual error — the .NET reference's own probe triggers identical entries while still receiving `0x33` DataUpdate frames successfully. The real issue was that those frames never started being sent in the first place.
|
||||
|
||||
Fix landed:
|
||||
- `SessionInner::publisher_endpoints` — per-session `HashMap<(platform_id, engine_id), ()>` cache mirroring `MxNativeSession._publisherEndpoints`.
|
||||
- `Session::ensure_publisher_connected(platform_id, engine_id)` — issues `INmxService2::Connect(local_engine, galaxy, platform, engine)` then `AddSubscriberEngine(engine, galaxy, source_platform, local_engine)`, once per publisher endpoint per session.
|
||||
- `Session::subscribe` and `Session::subscribe_buffered_nmx` — both call `ensure_publisher_connected` BEFORE the wire advise.
|
||||
- `subscribe_buffered_nmx` — additionally issues `AdviseSupervisory` after `RegisterReference`. The .NET reference's `RegisterBufferedItemAsync` only calls RegisterReference, but on this AVEVA install RegisterReference alone produces the registration result + heartbeat callbacks without ever starting DataUpdate dispatch; AdviseSupervisory unblocks the dispatch. Difference may be version-specific.
|
||||
|
||||
Live verification passes for both paths against `TestMachine_001.TestChangingInt`:
|
||||
- `cargo test -p mxaccess-compat --features live-windows-com --test plain_subscribe_live` — receives `0x32` SubscriptionStatus + sequence of `0x33` DataUpdate frames.
|
||||
- `cargo test -p mxaccess-compat --features live-windows-com --test buffered_subscribe_live` — same.
|
||||
|
||||
Both tests assert on the raw `Session::callbacks()` broadcast (NMX subscription messages) rather than the typed `Subscription::next` (DataChange) path because `TestChangingInt` on this Galaxy is configured with `quality=0x00C0 (Uncertain) value=null`, so the typed path filters every record. The test gate is "wire-level subscription works"; what the engine reports as the actual value is downstream-Galaxy state, out of scope for the Rust port.
|
||||
|
||||
Real codec fixes ALSO landed in this session as part of F56 investigation (independent from the resolution above):
|
||||
- `NmxSubscriptionMessage::try_parse_process_data_received_body` — peels the `ProcessDataReceived` envelope before calling `parse_inner`. The router previously called `parse_inner` directly on wire bytes, which would have silently dropped any `0x33` even if one arrived.
|
||||
- `NmxReferenceRegistrationResultMessage::try_parse_process_data_received_body` + router branch — drops `0x11` registration-result frames cleanly.
|
||||
- `Session::subscribe_buffered_nmx` — split-form (object, attribute) wire body + per-session monotonic `item_handle` counter (mirrors `MxNativeCompatibilityServer.AddBufferedItemAsync`'s `_nextItemHandle++`).
|
||||
|
||||
**Severity:** P1 — blocks F49 step 1 (F36 buffered live verification), F49 step 2 (F45 recovery replay), and ALL consumers relying on subscription data flow on this Galaxy.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user