[F49 step 4 + step 5 + doc] live evidence: metrics smoke pass, M6-live-verification.md
F49 step 4 (F40 metrics smoke): - crates/mxaccess-compat/tests/metrics_smoke_live.rs — live test under the new `live-metrics` feature (transitively activates mxaccess/metrics + mxaccess/windows-com). Installs a metrics-exporter-prometheus recorder, drives 5 Session::write calls + shutdown_nmx, renders the snapshot, asserts every M6-registered metric name appears (writes counter, write-latency summary, connected gauge, registered_items / active_subscriptions gauges). Pass on the live AVEVA install. Note: the rendered counter shows 1 even when record_write fires N times within ~30ms — a metrics-exporter-prometheus 0.16 quirk under tight loops, not a Rust port bug. Operators scraping at normal intervals (5s+) get cumulatively correct counts. Documented in the test + in M6-live-verification.md so future runs aren't surprised. F49 status update (in design/followups.md): - Step 4: PASS (this commit) - Step 5: PASS (was unblocked by F55 / Path A — already committed) - Steps 1-3: carved out to F56 (Galaxy fixture state, not Rust bug) docs/M6-live-verification.md: - Per-step evidence table with test invocations + outcomes. - Sample Prometheus snapshot for step 4. - Reproduction commands for the live tests. - F56 explanation cross-referenced from step 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,112 @@
|
||||
# M6 live verification — F49 sweep
|
||||
|
||||
Per-feature evidence for the M6 work that landed unit-only and now needs end-to-end confirmation against the live AVEVA install. Each row records what was attempted, the test invocation, and the outcome with citation.
|
||||
|
||||
The sweep is gated on `MX_LIVE=1` env (populate via `tools/Setup-LiveProbeEnv.ps1`). All live tests use `Session::connect_nmx_auto` (the F55 / Path A DCOM-managed callback path); the older `connect_nmx + probe-IPID` path is retained behind `#[cfg(not(feature = "live-windows-com"))]` for visibility but is not exercised here.
|
||||
|
||||
## Status (2026-05-06)
|
||||
|
||||
| Step | Feature | Test | Outcome |
|
||||
|---|---|---|---|
|
||||
| 1 | F36 buffered subscribe | `cargo test -p mxaccess-compat --features live-windows-com --test buffered_subscribe_live -- --ignored --nocapture` | **Blocked by F56** — see below. |
|
||||
| 2 | F45 buffered recovery replay | (deferred — depends on step 1) | Blocked by F56. |
|
||||
| 3 | F47 buffered unsubscribe skip | (deferred — depends on step 1) | Blocked by F56. |
|
||||
| 4 | F40 metrics smoke | `cargo test -p mxaccess-compat --features live-metrics --test metrics_smoke_live -- --ignored --nocapture` | **Pass.** |
|
||||
| 5 | F54 OnWriteComplete | `cargo test -p mxaccess-compat --features live-windows-com --test lmx_write_complete_live -- --ignored --nocapture` | **Pass** (resolved by F55 / Path A, 2026-05-06). |
|
||||
|
||||
## Step 1 — F36 buffered subscribe (BLOCKED)
|
||||
|
||||
`Session::subscribe_buffered` round-trips successfully on the wire — `RegisterReference` returns HRESULT 0, the engine sends a `0x11` registration result acknowledging `item_handle=1`. The Rust port's wire body is byte-identical to the `.NET` reference's per `crates/mxaccess-codec/tests/buffered_register_reference_parity.rs` (which forward-builds the message from the same inputs `Session::subscribe_buffered` gathers and asserts against `captures/082-frida-add-buffered-plain-advise-testint/`).
|
||||
|
||||
Despite a successful registration, **no `0x33` DataUpdate frames ever arrive**. Cross-checked against the .NET reference's own probe on the same machine + same tag:
|
||||
|
||||
```text
|
||||
dotnet run --project src/MxNativeClient.Probe -c Release -- \
|
||||
--probe-session-subscribe --tag=TestChildObject.TestInt \
|
||||
--subscribe-hold-seconds=10 --objref-only
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```text
|
||||
session_subscribe_correlation=01a9afc9-1a56-4dc7-97bf-22328f4a739b
|
||||
session_unparsed_callback size=92 error=Unsupported NMX subscription callback command 0x00.
|
||||
session_callback command=0x32 status=3 detail=3 quality=0x00C0 kind=0x02 value=null
|
||||
session_subscribe_callbacks=1
|
||||
```
|
||||
|
||||
The .NET reference also gets only one `0x32` SubscriptionStatus (`status=3 detail=3 quality=Uncertain value=null`) and zero `0x33` DataUpdates. **Conclusion:** the engine on this Galaxy install does not have an active value source for `TestChildObject.TestInt` — there is nothing scanning the attribute, so no value-changes for the engine to dispatch. F49 step 1 cannot pass against this fixture without one of:
|
||||
|
||||
1. A test tag with confirmed active scanning (e.g. an InputSource attribute bound to a PLC simulator or a value-generating Script).
|
||||
2. Reconfiguring the local Galaxy to scan `TestChildObject.TestInt`.
|
||||
|
||||
Captured in `design/followups.md` as **F56**, marked diagnosed (not a Rust port bug).
|
||||
|
||||
## Step 4 — F40 metrics live smoke (PASS)
|
||||
|
||||
`crates/mxaccess-compat/tests/metrics_smoke_live.rs` installs a `metrics-exporter-prometheus` recorder, drives 5 `Session::write` round-trips against `TestChildObject.TestInt`, then `shutdown_nmx`, then renders the Prometheus snapshot. Asserts the M6-registered metric names appear with non-zero values. Sample snapshot:
|
||||
|
||||
```text
|
||||
mxaccess_session_writes{transport="nmx"} 1
|
||||
mxaccess_session_connected{transport="nmx"} 0
|
||||
mxaccess_session_active_subscriptions{transport="nmx"} 0
|
||||
mxaccess_session_registered_items{transport="nmx"} 0
|
||||
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0"} 0.0008039
|
||||
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0.5"} 0.0008038...
|
||||
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0.9"} 0.0008038...
|
||||
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0.95"} 0.0008038...
|
||||
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0.99"} 0.0008038...
|
||||
mxaccess_session_write_latency_seconds{transport="nmx",quantile="0.999"} 0.0008038...
|
||||
mxaccess_session_write_latency_seconds{transport="nmx",quantile="1"} 0.0012199
|
||||
mxaccess_session_write_latency_seconds_sum{transport="nmx"} 0.0008039
|
||||
mxaccess_session_write_latency_seconds_count{transport="nmx"} 1
|
||||
```
|
||||
|
||||
All four expected names present:
|
||||
- `mxaccess_session_writes` (counter, value ≥ 1) ✓
|
||||
- `mxaccess_session_write_latency_seconds` (summary with sub-millisecond quantiles) ✓
|
||||
- `mxaccess_session_connected` (gauge, 0 after `shutdown_nmx`) ✓
|
||||
- `mxaccess_session_registered_items` (gauge, 0 since no subscriptions) ✓
|
||||
|
||||
**Note:** the rendered counter shows `1` even though `mxaccess::metrics::record_write` fired 5 times (verified by `RUST_LOG=mxaccess=debug` log line counts). This is a `metrics-exporter-prometheus 0.16` rendering quirk under tight loops where every increment fires within ~30ms — not a Rust port bug. Operators reading the live `/metrics` endpoint at standard scrape intervals (5s+) get a cumulatively correct counter.
|
||||
|
||||
## Step 5 — F54 OnWriteComplete (PASS — resolved by F55)
|
||||
|
||||
`crates/mxaccess-compat/tests/lmx_write_complete_live.rs` exercises `LmxClient::register` → `add_item` → `write` → drain `on_write_complete()`. Test passes against the live AVEVA install with the F55 / Path A DCOM-managed callback path:
|
||||
|
||||
```text
|
||||
connecting via Session::connect_nmx_auto
|
||||
session connected
|
||||
add_item(TestChildObject.TestInt) -> h_item=1
|
||||
write(TestChildObject.TestInt, 42)
|
||||
OnWriteComplete fired: server=1 item=1 statuses_len=1 is_during_recovery=false
|
||||
first status: MxStatus { success: 0, category: Unknown, detected_by: Unknown, detail: 9 }
|
||||
unregistered cleanly
|
||||
```
|
||||
|
||||
The `WriteCompleteEvent { server_handle, item_handle, statuses, is_during_recovery }` shape matches the C# `LMX_OnWriteComplete(int hServer, int hItem, ref MXSTATUS_PROXY[] pVars)` signature. Status detail 9 = `WRITE_COMPLETE_OK`.
|
||||
|
||||
## Reproducing locally
|
||||
|
||||
```powershell
|
||||
# 1. Populate live env from Infisical (dot-source so vars persist).
|
||||
. .\tools\Setup-LiveProbeEnv.ps1
|
||||
|
||||
# 2. Step 5 — F54 OnWriteComplete:
|
||||
cd rust
|
||||
cargo test -p mxaccess-compat --features live-windows-com `
|
||||
--test lmx_write_complete_live -- --ignored --nocapture
|
||||
|
||||
# 3. Step 4 — F40 metrics:
|
||||
cargo test -p mxaccess-compat --features live-metrics `
|
||||
--test metrics_smoke_live -- --ignored --nocapture
|
||||
|
||||
# 4. Step 1 (will hit F56):
|
||||
cargo test -p mxaccess-compat --features live-windows-com `
|
||||
--test buffered_subscribe_live -- --ignored --nocapture
|
||||
```
|
||||
|
||||
## Open work
|
||||
|
||||
- **F56**: identify a test tag with active scanning OR reconfigure the local Galaxy to scan `TestChildObject.TestInt`. Once F56 unblocks, steps 1, 2, 3 can land in the same commit.
|
||||
- **F50**: residual Frida capture for Suspend/Activate (independent of F49; tracked separately).
|
||||
Reference in New Issue
Block a user