[F56 resolved] subscribe paths now drive 0x33 DataUpdate frames
Root cause: `Session::subscribe` and `Session::subscribe_buffered_nmx`
were missing the `INmxService2::Connect` + `AddSubscriberEngine` RPC
pair that the .NET reference's `MxNativeSession.EnsurePublisherConnected`
(`cs:516-526`) issues before the first advise against a publishing
engine. Without those two RPCs, NmxSvc accepted the subscription
registration but the publishing engine never knew our engine was
subscribed — so it never dispatched DataUpdate frames back.
Diagnosis driven by wwtools/aalogcli reading
C:\ProgramData\ArchestrA\LogFiles. The user pointed at this tooling
which lit up the path.
Red herring: NmxSvc's `[Warning] NmxCallback->DataReceived ... failed
with error 0x{N}` log lines turned out to be normal log spam where N
is the bufferSize of the inbound call, not a real error code. The
.NET reference's own probe triggers identical entries while still
receiving DataUpdate frames successfully.
Fix:
- SessionInner::publisher_endpoints — per-session HashMap<(platform_id,
engine_id), ()> cache mirroring MxNativeSession._publisherEndpoints.
- Session::ensure_publisher_connected — issues Connect +
AddSubscriberEngine, once per publisher endpoint per session.
- Session::subscribe + subscribe_buffered_nmx — both call it before
the wire advise.
- subscribe_buffered_nmx — additionally issues AdviseSupervisory after
RegisterReference. The .NET reference's RegisterBufferedItemAsync
only calls RegisterReference, but on this AVEVA install
RegisterReference alone produces the registration result + heartbeat
callbacks without ever starting DataUpdate dispatch; AdviseSupervisory
unblocks the dispatch.
Live verification (`TestMachine_001.TestChangingInt`, a tag that
updates >1×/s):
cargo test -p mxaccess-compat --features live-windows-com \
--test plain_subscribe_live -- --ignored --nocapture
cargo test -p mxaccess-compat --features live-windows-com \
--test buffered_subscribe_live -- --ignored --nocapture
Both pass — `cmd=0x32` SubscriptionStatus + sequence of `cmd=0x33`
DataUpdate frames flow as expected. Tests assert on the raw
Session::callbacks() broadcast (not the typed Subscription::next
DataChange path) because the engine reports quality=Uncertain
value=null for this attribute on this Galaxy — the wire-level
subscription is what F56 was about, not the value content.
DcomCallbackSink reverted to S_OK return for both DataReceivedRaw
and StatusReceivedRaw (the bytes-processed / sentinel HRESULT
experiments during diagnosis turned out to be irrelevant — the
"failed with error 0xN" logs come from NmxSvc regardless of the
return value).
design/followups.md F49 + F56 + docs/M6-live-verification.md updated:
F56 resolved, F49 steps 1 + 4 + 5 pass live, steps 2 + 3 pending
(now executable on this fixture).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -8,39 +8,39 @@ The sweep is gated on `MX_LIVE=1` env (populate via `tools/Setup-LiveProbeEnv.ps
|
||||
|
||||
| Step | Feature | Test | Outcome |
|
||||
|---|---|---|---|
|
||||
| 1 | F36 buffered subscribe | `cargo test -p mxaccess-compat --features live-windows-com --test buffered_subscribe_live -- --ignored --nocapture` | **Blocked by F56** — see below. |
|
||||
| 2 | F45 buffered recovery replay | (deferred — depends on step 1) | Blocked by F56. |
|
||||
| 3 | F47 buffered unsubscribe skip | (deferred — depends on step 1) | Blocked by F56. |
|
||||
| 1 | F36 buffered subscribe | `cargo test -p mxaccess-compat --features live-windows-com --test buffered_subscribe_live -- --ignored --nocapture` | **Pass** (resolved by F56 / EnsurePublisherConnected). |
|
||||
| 2 | F45 buffered recovery replay | (mid-flight `recover_connection`) | Pending — fixture now available. |
|
||||
| 3 | F47 buffered unsubscribe skip | (drop subscription, assert no UnAdvise) | Pending — fixture now available. |
|
||||
| 4 | F40 metrics smoke | `cargo test -p mxaccess-compat --features live-metrics --test metrics_smoke_live -- --ignored --nocapture` | **Pass.** |
|
||||
| 5 | F54 OnWriteComplete | `cargo test -p mxaccess-compat --features live-windows-com --test lmx_write_complete_live -- --ignored --nocapture` | **Pass** (resolved by F55 / Path A, 2026-05-06). |
|
||||
|
||||
## Step 1 — F36 buffered subscribe (BLOCKED)
|
||||
## Step 1 — F36 buffered subscribe (PASS)
|
||||
|
||||
`Session::subscribe_buffered` round-trips successfully on the wire — `RegisterReference` returns HRESULT 0, the engine sends a `0x11` registration result acknowledging `item_handle=1`. The Rust port's wire body is byte-identical to the `.NET` reference's per `crates/mxaccess-codec/tests/buffered_register_reference_parity.rs` (which forward-builds the message from the same inputs `Session::subscribe_buffered` gathers and asserts against `captures/082-frida-add-buffered-plain-advise-testint/`).
|
||||
Initially blocked: `Session::subscribe_buffered` round-tripped `RegisterReference` cleanly but no `0x33` DataUpdate frames ever arrived. Plain `Session::subscribe` was affected the same way.
|
||||
|
||||
Despite a successful registration, **no `0x33` DataUpdate frames ever arrive**. Cross-checked against the .NET reference's own probe on the same machine + same tag:
|
||||
Root cause: `Session::subscribe` and `Session::subscribe_buffered_nmx` were missing the `INmxService2::Connect` + `AddSubscriberEngine` RPC pair that the .NET reference's `MxNativeSession.EnsurePublisherConnected` (`cs:516-526`) issues before the first advise. Without those two RPCs the publishing engine never registers our engine as a subscriber, so it never dispatches DataUpdate frames back. Logged + fixed in `design/followups.md` as **F56**.
|
||||
|
||||
```text
|
||||
dotnet run --project src/MxNativeClient.Probe -c Release -- \
|
||||
--probe-session-subscribe --tag=TestChildObject.TestInt \
|
||||
--subscribe-hold-seconds=10 --objref-only
|
||||
Diagnosis was driven by `wwtools/aalogcli` reading `C:\ProgramData\ArchestrA\LogFiles`:
|
||||
|
||||
```powershell
|
||||
& C:\Users\dohertj2\Desktop\wwtools\aalogcli\src\AaLog.Cli\bin\x86\Release\net48\aalog.exe `
|
||||
range --from <test-start> --to <test-end> --message "Nmx" --regex
|
||||
```
|
||||
|
||||
Output:
|
||||
A red herring along the way: NmxSvc's `[Warning] NmxCallback->DataReceived ... failed with error 0x{N}` log lines turned out to be normal log spam — N is the bufferSize of the inbound call, not a real error code. The .NET reference's own probe triggers identical log entries while still successfully receiving DataUpdate frames.
|
||||
|
||||
After the fix, live test against `TestMachine_001.TestChangingInt` (a tag that updates >1×/s on its own):
|
||||
|
||||
```text
|
||||
session_subscribe_correlation=01a9afc9-1a56-4dc7-97bf-22328f4a739b
|
||||
session_unparsed_callback size=92 error=Unsupported NMX subscription callback command 0x00.
|
||||
session_callback command=0x32 status=3 detail=3 quality=0x00C0 kind=0x02 value=null
|
||||
session_subscribe_callbacks=1
|
||||
plain subscribe correlation_id = [...]
|
||||
[raw 0] cmd=0x32 record_count=1 records.len=1
|
||||
[raw 1] cmd=0x33 record_count=1 records.len=1
|
||||
[raw 2] cmd=0x33 record_count=1 records.len=1
|
||||
received 3 raw NMX subscription messages
|
||||
test live::buffered_subscribe_yields_updates ... ok
|
||||
```
|
||||
|
||||
The .NET reference also gets only one `0x32` SubscriptionStatus (`status=3 detail=3 quality=Uncertain value=null`) and zero `0x33` DataUpdates. **Conclusion:** the engine on this Galaxy install does not have an active value source for `TestChildObject.TestInt` — there is nothing scanning the attribute, so no value-changes for the engine to dispatch. F49 step 1 cannot pass against this fixture without one of:
|
||||
|
||||
1. A test tag with confirmed active scanning (e.g. an InputSource attribute bound to a PLC simulator or a value-generating Script).
|
||||
2. Reconfiguring the local Galaxy to scan `TestChildObject.TestInt`.
|
||||
|
||||
Captured in `design/followups.md` as **F56**, marked diagnosed (not a Rust port bug).
|
||||
The test asserts on the raw `Session::callbacks()` broadcast (NMX subscription messages), not the value-filtered `Subscription::next` stream, because the engine reports `quality=0x00C0 (Uncertain) value=null` for `TestChangingInt` on this Galaxy. The wire-level subscription works; the null value is a Galaxy-state attribute on a tag that has no real upstream value source. The `MX_TEST_TAG` env var lets operators redirect at runtime — set it to a tag with an actual scanning binding (PLC, OPC, Script) to also exercise the typed `DataChange` path.
|
||||
|
||||
## Step 4 — F40 metrics live smoke (PASS)
|
||||
|
||||
@@ -68,7 +68,7 @@ All four expected names present:
|
||||
- `mxaccess_session_connected` (gauge, 0 after `shutdown_nmx`) ✓
|
||||
- `mxaccess_session_registered_items` (gauge, 0 since no subscriptions) ✓
|
||||
|
||||
**Note:** the rendered counter shows `1` even though `mxaccess::metrics::record_write` fired 5 times (verified by `RUST_LOG=mxaccess=debug` log line counts). This is a `metrics-exporter-prometheus 0.16` rendering quirk under tight loops where every increment fires within ~30ms — not a Rust port bug. Operators reading the live `/metrics` endpoint at standard scrape intervals (5s+) get a cumulatively correct counter.
|
||||
**Note:** the rendered counter shows `1` even though `mxaccess::metrics::record_write` fires 5 times (verified by `RUST_LOG=mxaccess=debug` log line counts). This is a `metrics-exporter-prometheus 0.16` rendering quirk under tight loops where every increment fires within ~30ms — not a Rust port bug. Operators reading the live `/metrics` endpoint at standard scrape intervals (5s+) get a cumulatively correct counter.
|
||||
|
||||
## Step 5 — F54 OnWriteComplete (PASS — resolved by F55)
|
||||
|
||||
@@ -101,12 +101,13 @@ cargo test -p mxaccess-compat --features live-windows-com `
|
||||
cargo test -p mxaccess-compat --features live-metrics `
|
||||
--test metrics_smoke_live -- --ignored --nocapture
|
||||
|
||||
# 4. Step 1 (will hit F56):
|
||||
# 4. Step 1 — F36 buffered subscribe (use a scanning tag):
|
||||
$env:MX_TEST_TAG = "TestMachine_001.TestChangingInt"
|
||||
cargo test -p mxaccess-compat --features live-windows-com `
|
||||
--test buffered_subscribe_live -- --ignored --nocapture
|
||||
```
|
||||
|
||||
## Open work
|
||||
|
||||
- **F56**: identify a test tag with active scanning OR reconfigure the local Galaxy to scan `TestChildObject.TestInt`. Once F56 unblocks, steps 1, 2, 3 can land in the same commit.
|
||||
- **F50**: residual Frida capture for Suspend/Activate (independent of F49; tracked separately).
|
||||
- **F49 steps 2 + 3** — recovery replay and unsubscribe-skip live verification. Both have working fixtures now (F56 unblocked), just need the test scaffolding.
|
||||
- **F50** — residual Frida capture for Suspend/Activate (independent of F49).
|
||||
|
||||
Reference in New Issue
Block a user