2 Commits

Author SHA1 Message Date
Joseph Doherty 5e11b30507 [F56 resolved] subscribe paths now drive 0x33 DataUpdate frames
Root cause: `Session::subscribe` and `Session::subscribe_buffered_nmx`
were missing the `INmxService2::Connect` + `AddSubscriberEngine` RPC
pair that the .NET reference's `MxNativeSession.EnsurePublisherConnected`
(`cs:516-526`) issues before the first advise against a publishing
engine. Without those two RPCs, NmxSvc accepted the subscription
registration but the publishing engine never knew our engine was
subscribed — so it never dispatched DataUpdate frames back.

Diagnosis driven by wwtools/aalogcli reading
C:\ProgramData\ArchestrA\LogFiles. The user pointed at this tooling
which lit up the path.

Red herring: NmxSvc's `[Warning] NmxCallback->DataReceived ... failed
with error 0x{N}` log lines turned out to be normal log spam where N
is the bufferSize of the inbound call, not a real error code. The
.NET reference's own probe triggers identical entries while still
receiving DataUpdate frames successfully.

Fix:
- SessionInner::publisher_endpoints — per-session HashMap<(platform_id,
  engine_id), ()> cache mirroring MxNativeSession._publisherEndpoints.
- Session::ensure_publisher_connected — issues Connect +
  AddSubscriberEngine, once per publisher endpoint per session.
- Session::subscribe + subscribe_buffered_nmx — both call it before
  the wire advise.
- subscribe_buffered_nmx — additionally issues AdviseSupervisory after
  RegisterReference. The .NET reference's RegisterBufferedItemAsync
  only calls RegisterReference, but on this AVEVA install
  RegisterReference alone produces the registration result + heartbeat
  callbacks without ever starting DataUpdate dispatch; AdviseSupervisory
  unblocks the dispatch.

Live verification (`TestMachine_001.TestChangingInt`, a tag that
updates >1×/s):
  cargo test -p mxaccess-compat --features live-windows-com \
      --test plain_subscribe_live -- --ignored --nocapture
  cargo test -p mxaccess-compat --features live-windows-com \
      --test buffered_subscribe_live -- --ignored --nocapture
Both pass — `cmd=0x32` SubscriptionStatus + sequence of `cmd=0x33`
DataUpdate frames flow as expected. Tests assert on the raw
Session::callbacks() broadcast (not the typed Subscription::next
DataChange path) because the engine reports quality=Uncertain
value=null for this attribute on this Galaxy — the wire-level
subscription is what F56 was about, not the value content.

DcomCallbackSink reverted to S_OK return for both DataReceivedRaw
and StatusReceivedRaw (the bytes-processed / sentinel HRESULT
experiments during diagnosis turned out to be irrelevant — the
"failed with error 0xN" logs come from NmxSvc regardless of the
return value).

design/followups.md F49 + F56 + docs/M6-live-verification.md updated:
F56 resolved, F49 steps 1 + 4 + 5 pass live, steps 2 + 3 pending
(now executable on this fixture).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:32:07 -04:00
Joseph Doherty df3457c54a [F56] subscribe / subscribe_buffered: split-form wire body + diagnose Galaxy fixture gap
Three real fixes + one architectural diagnosis:

1. Session::subscribe_buffered_nmx now sends the .NET-reference split
   form on the wire:
     item_definition = "<attr>.property(buffer)"   (was: full reference)
     item_context    = "<object_tag_name>"          (was: empty)
     item_handle     = SessionInner::next_item_handle.fetch_add(1)
                       (was: hardcoded 0)
   Verified byte-identical against captures/082 + 094 by the existing
   buffered_register_reference_parity unit tests. The
   item_handle counter mirrors MxNativeCompatibilityServer's
   _nextItemHandle++ at MxNativeSession.cs:613.

2. New live tests:
   - tests/buffered_subscribe_live.rs (F49 step 1) — uses real Galaxy
     metadata via SqlTagResolver + connect_nmx_auto, drives a
     background writer at 500ms cadence to force value-changes,
     drains DataChange events from Subscription.
   - tests/plain_subscribe_live.rs — same harness over plain
     Session::subscribe (NOT buffered), used to isolate whether
     "no DataUpdate" is buffered-specific (it's not — both fail).

   Both pull tracing-subscriber as a dev-dep so `RUST_LOG=trace`
   surfaces dcom_sink + router activity.

3. mxaccess-galaxy/sql_resolver.rs: drop the inner-attribute
   `#![cfg(feature = "galaxy-resolver")]` — the module-level cfg on
   `pub mod sql_resolver` in lib.rs already handles this and Rust
   1.85's clippy::duplicated_attributes lint flagged the duplicate
   once mxaccess-compat dev-deps activated the feature.

4. F56 finding (diagnosis, NOT a bug fix): the engine on this Galaxy
   install does not have an active value for TestChildObject.TestInt.
   Confirmed by running the .NET reference's own probe:

     dotnet run --project src/MxNativeClient.Probe -c Release \
       -- --probe-session-subscribe --tag=TestChildObject.TestInt \
       --subscribe-hold-seconds=10

   ...returns ONE 0x32 SubscriptionStatus (status=3 detail=3
   quality=0x00C0 Uncertain value=null) and zero 0x33 DataUpdates —
   matching the Rust port's symptom exactly. Not a Rust port bug,
   not a wire-byte gap. F49 steps 1-3 need either an actively-
   scanned tag or local Galaxy reconfiguration to scan
   TestChildObject.TestInt.

Workspace tests + clippy clean under both feature configurations.
F56 entry in design/followups.md updated with the full diagnostic
chain so future-me / future-collaborators can pick it up without
re-tracing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 10:27:08 -04:00