Files
mxaccess/docs/M6-live-verification.md
T
Joseph Doherty c6332c26a1 [F49 step 4 + step 5 + doc] live evidence: metrics smoke pass, M6-live-verification.md
F49 step 4 (F40 metrics smoke):
- crates/mxaccess-compat/tests/metrics_smoke_live.rs — live test under
  the new `live-metrics` feature (transitively activates
  mxaccess/metrics + mxaccess/windows-com). Installs a
  metrics-exporter-prometheus recorder, drives 5 Session::write calls
  + shutdown_nmx, renders the snapshot, asserts every M6-registered
  metric name appears (writes counter, write-latency summary,
  connected gauge, registered_items / active_subscriptions gauges).
  Pass on the live AVEVA install.

  Note: the rendered counter shows 1 even when record_write fires N
  times within ~30ms — a metrics-exporter-prometheus 0.16 quirk under
  tight loops, not a Rust port bug. Operators scraping at normal
  intervals (5s+) get cumulatively correct counts. Documented in the
  test + in M6-live-verification.md so future runs aren't surprised.

F49 status update (in design/followups.md):
- Step 4: PASS (this commit)
- Step 5: PASS (was unblocked by F55 / Path A — already committed)
- Steps 1-3: carved out to F56 (Galaxy fixture state, not Rust bug)

docs/M6-live-verification.md:
- Per-step evidence table with test invocations + outcomes.
- Sample Prometheus snapshot for step 4.
- Reproduction commands for the live tests.
- F56 explanation cross-referenced from step 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 10:36:09 -04:00

6.9 KiB

M6 live verification — F49 sweep

Per-feature evidence for the M6 work that landed unit-only and now needs end-to-end confirmation against the live AVEVA install. Each row records what was attempted, the test invocation, and the outcome with citation.

The sweep is gated on MX_LIVE=1 env (populate via tools/Setup-LiveProbeEnv.ps1). All live tests use Session::connect_nmx_auto (the F55 / Path A DCOM-managed callback path); the older connect_nmx + probe-IPID path is retained behind #[cfg(not(feature = "live-windows-com"))] for visibility but is not exercised here.

Status (2026-05-06)

Step Feature Test Outcome
1 F36 buffered subscribe cargo test -p mxaccess-compat --features live-windows-com --test buffered_subscribe_live -- --ignored --nocapture Blocked by F56 — see below.
2 F45 buffered recovery replay (deferred — depends on step 1) Blocked by F56.
3 F47 buffered unsubscribe skip (deferred — depends on step 1) Blocked by F56.
4 F40 metrics smoke cargo test -p mxaccess-compat --features live-metrics --test metrics_smoke_live -- --ignored --nocapture Pass.
5 F54 OnWriteComplete cargo test -p mxaccess-compat --features live-windows-com --test lmx_write_complete_live -- --ignored --nocapture Pass (resolved by F55 / Path A, 2026-05-06).

Step 1 — F36 buffered subscribe (BLOCKED)

Session::subscribe_buffered round-trips successfully on the wire — RegisterReference returns HRESULT 0, the engine sends a 0x11 registration result acknowledging item_handle=1. The Rust port's wire body is byte-identical to the .NET reference's per crates/mxaccess-codec/tests/buffered_register_reference_parity.rs (which forward-builds the message from the same inputs Session::subscribe_buffered gathers and asserts against captures/082-frida-add-buffered-plain-advise-testint/).

Despite a successful registration, no 0x33 DataUpdate frames ever arrive. Cross-checked against the .NET reference's own probe on the same machine + same tag:

dotnet run --project src/MxNativeClient.Probe -c Release -- \
    --probe-session-subscribe --tag=TestChildObject.TestInt \
    --subscribe-hold-seconds=10 --objref-only

Output:

session_subscribe_correlation=01a9afc9-1a56-4dc7-97bf-22328f4a739b
session_unparsed_callback size=92 error=Unsupported NMX subscription callback command 0x00.
session_callback command=0x32 status=3 detail=3 quality=0x00C0 kind=0x02 value=null
session_subscribe_callbacks=1

The .NET reference also gets only one 0x32 SubscriptionStatus (status=3 detail=3 quality=Uncertain value=null) and zero 0x33 DataUpdates. Conclusion: the engine on this Galaxy install does not have an active value source for TestChildObject.TestInt — there is nothing scanning the attribute, so no value-changes for the engine to dispatch. F49 step 1 cannot pass against this fixture without one of:

  1. A test tag with confirmed active scanning (e.g. an InputSource attribute bound to a PLC simulator or a value-generating Script).
  2. Reconfiguring the local Galaxy to scan TestChildObject.TestInt.

Captured in design/followups.md as F56, marked diagnosed (not a Rust port bug).

Step 4 — F40 metrics live smoke (PASS)

crates/mxaccess-compat/tests/metrics_smoke_live.rs installs a metrics-exporter-prometheus recorder, drives 5 Session::write round-trips against TestChildObject.TestInt, then shutdown_nmx, then renders the Prometheus snapshot. Asserts the M6-registered metric names appear with non-zero values. Sample snapshot:

mxaccess_session_writes{transport="nmx"} 1
mxaccess_session_connected{transport="nmx"} 0
mxaccess_session_active_subscriptions{transport="nmx"} 0
mxaccess_session_registered_items{transport="nmx"} 0
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0"} 0.0008039
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0.5"} 0.0008038...
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0.9"} 0.0008038...
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0.95"} 0.0008038...
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0.99"} 0.0008038...
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0.999"} 0.0008038...
mxaccess_session_write_latency_seconds{transport="nmx",quantile="1"} 0.0012199
mxaccess_session_write_latency_seconds_sum{transport="nmx"} 0.0008039
mxaccess_session_write_latency_seconds_count{transport="nmx"} 1

All four expected names present:

  • mxaccess_session_writes (counter, value ≥ 1) ✓
  • mxaccess_session_write_latency_seconds (summary with sub-millisecond quantiles) ✓
  • mxaccess_session_connected (gauge, 0 after shutdown_nmx) ✓
  • mxaccess_session_registered_items (gauge, 0 since no subscriptions) ✓

Note: the rendered counter shows 1 even though mxaccess::metrics::record_write fired 5 times (verified by RUST_LOG=mxaccess=debug log line counts). This is a metrics-exporter-prometheus 0.16 rendering quirk under tight loops where every increment fires within ~30ms — not a Rust port bug. Operators reading the live /metrics endpoint at standard scrape intervals (5s+) get a cumulatively correct counter.

Step 5 — F54 OnWriteComplete (PASS — resolved by F55)

crates/mxaccess-compat/tests/lmx_write_complete_live.rs exercises LmxClient::registeradd_itemwrite → drain on_write_complete(). Test passes against the live AVEVA install with the F55 / Path A DCOM-managed callback path:

connecting via Session::connect_nmx_auto
session connected
add_item(TestChildObject.TestInt) -> h_item=1
write(TestChildObject.TestInt, 42)
OnWriteComplete fired: server=1 item=1 statuses_len=1 is_during_recovery=false
first status: MxStatus { success: 0, category: Unknown, detected_by: Unknown, detail: 9 }
unregistered cleanly

The WriteCompleteEvent { server_handle, item_handle, statuses, is_during_recovery } shape matches the C# LMX_OnWriteComplete(int hServer, int hItem, ref MXSTATUS_PROXY[] pVars) signature. Status detail 9 = WRITE_COMPLETE_OK.

Reproducing locally

# 1. Populate live env from Infisical (dot-source so vars persist).
. .\tools\Setup-LiveProbeEnv.ps1

# 2. Step 5 — F54 OnWriteComplete:
cd rust
cargo test -p mxaccess-compat --features live-windows-com `
    --test lmx_write_complete_live -- --ignored --nocapture

# 3. Step 4 — F40 metrics:
cargo test -p mxaccess-compat --features live-metrics `
    --test metrics_smoke_live -- --ignored --nocapture

# 4. Step 1 (will hit F56):
cargo test -p mxaccess-compat --features live-windows-com `
    --test buffered_subscribe_live -- --ignored --nocapture

Open work

  • F56: identify a test tag with active scanning OR reconfigure the local Galaxy to scan TestChildObject.TestInt. Once F56 unblocks, steps 1, 2, 3 can land in the same commit.
  • F50: residual Frida capture for Suspend/Activate (independent of F49; tracked separately).