Commit Graph

6 Commits

Author SHA1 Message Date
Joseph Doherty 047125bc11 M6 live verification: re-run all 5 steps + filter infisical banner
Three doc fixes pinned by re-running today's full live-test sweep:

1. Bump status header from 2026-05-06 to "re-run 2026-05-07" with a
   note that all 5 steps still pass against the live AVEVA install.
   The first run of step 1 + step 5 today failed with
   `Error::Status { detail: 5 }` (DCE/RPC fault 0x00000005) traced
   to MX_TEST_DOMAIN being polluted with the infisical CLI's
   "A new release of infisical is available" upgrade banner. The
   banner was being concatenated onto the domain string by
   Setup-LiveProbeEnv.ps1's `2>&1` capture, causing NTLM Type1 to
   send a malformed domain field that NmxSvc rejected.

2. Fix tools/Setup-LiveProbeEnv.ps1 — Get-InfisicalSecret now splits
   captured output on newlines, filters lines matching the
   "^A new release of infisical is available" / "^Please upgrade"
   banner patterns, and returns the last non-empty line (the actual
   secret value from `infisical secrets get --plain`). Robust to
   future banner messages of similar shape.

3. Fix two drifted line citations in docs/M6-live-verification.md:
   `recover_connection_core (session.rs:1428-...)` is now at line
   1374 after F56/F45/F47 edits — strip the line number, keep the
   function name (`Session::recover_connection_core`). Same for
   `Session::unsubscribe (session.rs:2261)`.

4. Add "Workspace gate (no live infra needed)" subsection to the
   "Reproducing locally" recipe so a fresh contributor sees the
   full V1 verification recipe (live + workspace gate) in one place.

All 5 live tests pass post-fix:
  - F36 buffered subscribe (drained 1 raw NMX message; no scan
    activity on TestChangingInt today, matches 5/6 baseline)
  - F45 buffered recovery replay (2 pre + 2 post DataUpdate frames)
  - F47 buffered unsubscribe skip (returned Ok)
  - F40 metrics smoke (4 expected metric names present)
  - F54 OnWriteComplete (status detail 9 = WRITE_COMPLETE_OK)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 09:17:46 -04:00
Joseph Doherty d668d5b7b1 mxaccess: fix 9 unit tests broken silently by F56's ensure_publisher_connected
Workspace gate sweep flagged 9 unit tests in mxaccess::session that
had been silently failing since F56 landed (commit 5e11b30). Root
cause: F56 added ensure_publisher_connected (issuing
INmxService2::Connect + AddSubscriberEngine before each
AdviseSupervisory) but the in-process fake-NMX-server fixtures'
responses vec sizes weren't bumped. Once the fake server ran out of
responses mid-handshake, the connection was closed and the client
got ConnectionAborted (10053).

Fix: bumped each test's unauthenticated_server / recording_server
response count by 2 to cover the new pair of RPCs. Tests touched:

  - subscribe_then_unsubscribe_round_trip (2 → 4 responses)
  - two_subscribes_produce_distinct_correlation_ids (4 → 6)
  - subscription_stream_yields_data_change_for_matching_correlation (1 → 3)
  - subscription_stream_filters_out_mismatched_correlation_for_status (1 → 3)
  - subscription_stream_keeps_data_update_regardless_of_correlation (1 → 3)
  - subscribe_populates_registry_unsubscribe_clears_it (2 → 4)
  - read_returns_first_data_change_within_timeout (2 → 4)
  - read_returns_timeout_when_no_data_arrives (2 → 4)
  - unsubscribe_skips_un_advise_for_buffered_subscription (2 → 3
    + mid-flow assertion bumped from len()==1 to len()==3)

The two_subscribes test only adds 2 (not 4) extra responses because
the second subscribe hits the per-engine publisher_endpoints cache.

Workspace gate post-fix: 847 tests pass, 0 failed, 9 ignored
(live-only). Clippy + bench clean. Pinned in
docs/M6-live-verification.md "Workspace gate (2026-05-07)" so the
test-fixture lag is recorded for future audits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 04:44:18 -04:00
Joseph Doherty 9ed4700eb4 docs: audit pass — fix stale F-number references
Walked all 18 docs/*.md for stale followup references and outdated
TODO markers. Two real fixes:

docs/M6-buffered-evidence.md:
- Three references to "F45" for the LMX-proxy Suspend/Activate
  Frida instrumentation were stale. That work was actually filed
  as F46 when the followups list got renumbered (F45 was reassigned
  to "Recovery replay should re-issue RegisterReference for
  buffered subscriptions"). F46 landed in commit 808fea1, and the
  follow-up live capture landed as F50 in commit 349e217.
- Updated all three references to point at F46 + F50 + the
  resolution evidence in docs/F50-suspend-activate-evidence.md.
- Renamed the "Sub-followup filed: F45" section to
  "Sub-followup F46 — RESOLVED 2026-05-06" with the verdict from
  the live capture.

docs/M6-live-verification.md:
- "Open work" section listed F50 as a residual gap. F50 closed
  2026-05-06 per docs/F50-suspend-activate-evidence.md. Updated
  to "None. F49 sweep complete; F50 closed".

Other docs scanned, no real staleness:
- Capture-Run-2026-04-25.md, Current-Sprint-State.md,
  DotNet10-Native-Library-Plan.md — historical snapshot docs,
  intentionally pinned to their dates.
- ASB-Native-Integration-Decision.md, MxNativeSession-API.md,
  NMX-COM-Contracts.md, MXAccess-* — describe the .NET reference's
  state; "not yet" wording reflects the .NET planning context, not
  the Rust port's current state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 04:32:28 -04:00
Joseph Doherty d149143535 [F49 steps 2 + 3] live verification: buffered recovery replay + unsubscribe skip
Step 3 (F47 buffered unsubscribe skip):
- crates/mxaccess-compat/tests/buffered_unsubscribe_skip_live.rs.
- Subscribe buffered, sleep so the engine has DataUpdates in flight,
  then call unsubscribe. Asserts Ok return without surfacing transport
  or HRESULT errors.
- Session::unsubscribe (session.rs:2261) probes the registry: if
  Buffered { .. }, it skips nmx.un_advise entirely, mirroring the .NET
  reference's `if (!subscription.IsBuffered)` guard at
  MxNativeSession.cs:361-381. If unsubscribe accidentally emitted
  UnAdvise for a buffered correlation id, the engine would return
  non-zero HRESULT (no matching plain advise to retract) — surfacing
  as a panic.

Step 2 (F45 buffered recovery replay):
- crates/mxaccess-compat/tests/buffered_recovery_replay_live.rs.
- Subscribe buffered, drain >=1 NMX subscription message
  (cmd=0x32 SubscriptionStatus + cmd=0x33 DataUpdate) to confirm the
  wire path is hot pre-recovery, install a RebuildFactory that calls
  NmxClient::create (the same auto-resolving COM-activation path
  Session::connect_nmx_auto uses), invoke recover_connection, drain
  >=1 NMX subscription message post-recovery.
- Verifies the replay branch in recover_connection_core re-issues
  RegisterReference (NOT AdviseSupervisory) for the buffered entry,
  mirroring MxNativeSession.ReAdviseSubscription (cs:538-569).
  Structural property is unit-tested; this confirms the engine
  actually picks back up after the rebuild + replay.

Both tests pass live on this Galaxy:
  cargo test -p mxaccess-compat --features live-windows-com \
      --test buffered_unsubscribe_skip_live -- --ignored --nocapture
  cargo test -p mxaccess-compat --features live-windows-com \
      --test buffered_recovery_replay_live -- --ignored --nocapture

Pulls mxaccess-nmx + mxaccess-codec into mxaccess-compat dev-deps so
the recovery test can build a RebuildFactory closure that returns
NmxClient and bind a typed broadcast Receiver.

design/followups.md F49 -> Resolved (all five steps pass live).
docs/M6-live-verification.md updated with per-step evidence + repro
commands.

F49 is fully closed out. F55 (DCOM-managed INmxSvcCallback, Path A)
and F56 (missing EnsurePublisherConnected + post-RegisterReference
AdviseSupervisory for buffered) were the two real Rust-port bugs
uncovered along the way; both resolved. Remaining post-V1 followups
(F50 Suspend/Activate Frida, F51 ASB type matrix, F52 perf, F53 doc
lint, etc.) are scoped independently and not part of F49.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:00:44 -04:00
Joseph Doherty 5e11b30507 [F56 resolved] subscribe paths now drive 0x33 DataUpdate frames
Root cause: `Session::subscribe` and `Session::subscribe_buffered_nmx`
were missing the `INmxService2::Connect` + `AddSubscriberEngine` RPC
pair that the .NET reference's `MxNativeSession.EnsurePublisherConnected`
(`cs:516-526`) issues before the first advise against a publishing
engine. Without those two RPCs, NmxSvc accepted the subscription
registration but the publishing engine never knew our engine was
subscribed — so it never dispatched DataUpdate frames back.

Diagnosis driven by wwtools/aalogcli reading
C:\ProgramData\ArchestrA\LogFiles. The user pointed at this tooling
which lit up the path.

Red herring: NmxSvc's `[Warning] NmxCallback->DataReceived ... failed
with error 0x{N}` log lines turned out to be normal log spam where N
is the bufferSize of the inbound call, not a real error code. The
.NET reference's own probe triggers identical entries while still
receiving DataUpdate frames successfully.

Fix:
- SessionInner::publisher_endpoints — per-session HashMap<(platform_id,
  engine_id), ()> cache mirroring MxNativeSession._publisherEndpoints.
- Session::ensure_publisher_connected — issues Connect +
  AddSubscriberEngine, once per publisher endpoint per session.
- Session::subscribe + subscribe_buffered_nmx — both call it before
  the wire advise.
- subscribe_buffered_nmx — additionally issues AdviseSupervisory after
  RegisterReference. The .NET reference's RegisterBufferedItemAsync
  only calls RegisterReference, but on this AVEVA install
  RegisterReference alone produces the registration result + heartbeat
  callbacks without ever starting DataUpdate dispatch; AdviseSupervisory
  unblocks the dispatch.

Live verification (`TestMachine_001.TestChangingInt`, a tag that
updates >1×/s):
  cargo test -p mxaccess-compat --features live-windows-com \
      --test plain_subscribe_live -- --ignored --nocapture
  cargo test -p mxaccess-compat --features live-windows-com \
      --test buffered_subscribe_live -- --ignored --nocapture
Both pass — `cmd=0x32` SubscriptionStatus + sequence of `cmd=0x33`
DataUpdate frames flow as expected. Tests assert on the raw
Session::callbacks() broadcast (not the typed Subscription::next
DataChange path) because the engine reports quality=Uncertain
value=null for this attribute on this Galaxy — the wire-level
subscription is what F56 was about, not the value content.

DcomCallbackSink reverted to S_OK return for both DataReceivedRaw
and StatusReceivedRaw (the bytes-processed / sentinel HRESULT
experiments during diagnosis turned out to be irrelevant — the
"failed with error 0xN" logs come from NmxSvc regardless of the
return value).

design/followups.md F49 + F56 + docs/M6-live-verification.md updated:
F56 resolved, F49 steps 1 + 4 + 5 pass live, steps 2 + 3 pending
(now executable on this fixture).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:32:07 -04:00
Joseph Doherty c6332c26a1 [F49 step 4 + step 5 + doc] live evidence: metrics smoke pass, M6-live-verification.md
F49 step 4 (F40 metrics smoke):
- crates/mxaccess-compat/tests/metrics_smoke_live.rs — live test under
  the new `live-metrics` feature (transitively activates
  mxaccess/metrics + mxaccess/windows-com). Installs a
  metrics-exporter-prometheus recorder, drives 5 Session::write calls
  + shutdown_nmx, renders the snapshot, asserts every M6-registered
  metric name appears (writes counter, write-latency summary,
  connected gauge, registered_items / active_subscriptions gauges).
  Pass on the live AVEVA install.

  Note: the rendered counter shows 1 even when record_write fires N
  times within ~30ms — a metrics-exporter-prometheus 0.16 quirk under
  tight loops, not a Rust port bug. Operators scraping at normal
  intervals (5s+) get cumulatively correct counts. Documented in the
  test + in M6-live-verification.md so future runs aren't surprised.

F49 status update (in design/followups.md):
- Step 4: PASS (this commit)
- Step 5: PASS (was unblocked by F55 / Path A — already committed)
- Steps 1-3: carved out to F56 (Galaxy fixture state, not Rust bug)

docs/M6-live-verification.md:
- Per-step evidence table with test invocations + outcomes.
- Sample Prometheus snapshot for step 4.
- Reproduction commands for the live tests.
- F56 explanation cross-referenced from step 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 10:36:09 -04:00