[F56 resolved] subscribe paths now drive 0x33 DataUpdate frames

Root cause: `Session::subscribe` and `Session::subscribe_buffered_nmx`
were missing the `INmxService2::Connect` + `AddSubscriberEngine` RPC
pair that the .NET reference's `MxNativeSession.EnsurePublisherConnected`
(`cs:516-526`) issues before the first advise against a publishing
engine. Without those two RPCs, NmxSvc accepted the subscription
registration but the publishing engine never knew our engine was
subscribed — so it never dispatched DataUpdate frames back.

Diagnosis driven by wwtools/aalogcli reading
C:\ProgramData\ArchestrA\LogFiles. The user pointed at this tooling
which lit up the path.

Red herring: NmxSvc's `[Warning] NmxCallback->DataReceived ... failed
with error 0x{N}` log lines turned out to be normal log spam where N
is the bufferSize of the inbound call, not a real error code. The
.NET reference's own probe triggers identical entries while still
receiving DataUpdate frames successfully.

Fix:
- SessionInner::publisher_endpoints — per-session HashMap<(platform_id,
  engine_id), ()> cache mirroring MxNativeSession._publisherEndpoints.
- Session::ensure_publisher_connected — issues Connect +
  AddSubscriberEngine, once per publisher endpoint per session.
- Session::subscribe + subscribe_buffered_nmx — both call it before
  the wire advise.
- subscribe_buffered_nmx — additionally issues AdviseSupervisory after
  RegisterReference. The .NET reference's RegisterBufferedItemAsync
  only calls RegisterReference, but on this AVEVA install
  RegisterReference alone produces the registration result + heartbeat
  callbacks without ever starting DataUpdate dispatch; AdviseSupervisory
  unblocks the dispatch.

Live verification (`TestMachine_001.TestChangingInt`, a tag that
updates >1×/s):
  cargo test -p mxaccess-compat --features live-windows-com \
      --test plain_subscribe_live -- --ignored --nocapture
  cargo test -p mxaccess-compat --features live-windows-com \
      --test buffered_subscribe_live -- --ignored --nocapture
Both pass — `cmd=0x32` SubscriptionStatus + sequence of `cmd=0x33`
DataUpdate frames flow as expected. Tests assert on the raw
Session::callbacks() broadcast (not the typed Subscription::next
DataChange path) because the engine reports quality=Uncertain
value=null for this attribute on this Galaxy — the wire-level
subscription is what F56 was about, not the value content.

DcomCallbackSink reverted to S_OK return for both DataReceivedRaw
and StatusReceivedRaw (the bytes-processed / sentinel HRESULT
experiments during diagnosis turned out to be irrelevant — the
"failed with error 0xN" logs come from NmxSvc regardless of the
return value).

design/followups.md F49 + F56 + docs/M6-live-verification.md updated:
F56 resolved, F49 steps 1 + 4 + 5 pass live, steps 2 + 3 pending
(now executable on this fixture).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-06 11:32:07 -04:00
parent c6332c26a1
commit 5e11b30507
6 changed files with 279 additions and 119 deletions
@@ -17,8 +17,7 @@ mod live {
use std::sync::Arc;
use std::time::{Duration, Instant};
use futures_util::StreamExt;
use mxaccess::{MxValue, RecoveryPolicy, Session, SessionOptions};
use mxaccess::{RecoveryPolicy, Session, SessionOptions};
use mxaccess_galaxy::SqlTagResolver;
use mxaccess_rpc::ntlm::NtlmClientContext;
@@ -63,52 +62,37 @@ mod live {
.expect("connect_nmx_auto");
eprintln!("session connected");
let mut sub = session.subscribe(&tag).await.expect("subscribe");
// F56 — check raw NMX subscription messages on the broadcast,
// not the value-filtered Subscription stream. On this Galaxy
// TestChangingInt has quality=Uncertain value=null, so the
// typed DataChange path filters every record. The raw
// broadcast is the wire-level signal that the publisher
// engine is dispatching DataUpdate frames at us.
let mut callbacks_rx = session.callbacks();
let sub = session.subscribe(&tag).await.expect("subscribe");
eprintln!("plain subscribe correlation_id = {:02x?}", sub.correlation_id());
// Background writer to force value changes.
let deadline = Instant::now() + Duration::from_secs(20);
let writer_session = session.clone();
let writer_tag = tag.clone();
let writer_stop = Arc::new(std::sync::atomic::AtomicBool::new(false));
let writer_stop_clone = writer_stop.clone();
let writer = tokio::spawn(async move {
let mut value: i32 = 2_000;
while !writer_stop_clone.load(std::sync::atomic::Ordering::Acquire) {
if writer_session
.write(&writer_tag, MxValue::Int32(value))
.await
.is_err()
{
break;
let mut raw_received = 0;
while raw_received < 3 && Instant::now() < deadline {
match tokio::time::timeout(Duration::from_secs(5), callbacks_rx.recv()).await {
Ok(Ok(msg)) => {
eprintln!(
"[raw {raw_received}] cmd=0x{:02x} record_count={} records.len={}",
msg.command, msg.record_count, msg.records.len()
);
raw_received += 1;
}
value = value.wrapping_add(1);
tokio::time::sleep(Duration::from_millis(500)).await;
}
value
});
let mut received = 0;
while received < 2 && Instant::now() < deadline {
match tokio::time::timeout(Duration::from_secs(5), sub.next()).await {
Ok(Some(Ok(dc))) => {
eprintln!("[{received}] {} = {:?} ts={:?}", dc.reference, dc.value, dc.timestamp);
received += 1;
}
Ok(Some(Err(e))) => {
writer_stop.store(true, std::sync::atomic::Ordering::Release);
let _ = writer.await;
panic!("subscription error: {e}");
}
Ok(None) => break,
Err(_) => eprintln!("5s gap waiting for next update"),
Ok(Err(_)) => break,
Err(_) => eprintln!("5s gap waiting for next NMX message"),
}
}
writer_stop.store(true, std::sync::atomic::Ordering::Release);
let _ = writer.await;
assert!(received >= 1, "no DataChange arrived for plain subscribe");
eprintln!("received {received} updates via plain subscribe");
assert!(
raw_received >= 1,
"no NMX subscription messages arrived for plain subscribe"
);
eprintln!("received {raw_received} raw NMX subscription messages");
session.unsubscribe(sub).await.expect("unsubscribe");
session.shutdown_nmx().await.expect("shutdown");