Commit Graph

142 Commits

Author SHA1 Message Date
Joseph Doherty 047125bc11 M6 live verification: re-run all 5 steps + filter infisical banner
Three doc fixes pinned by re-running today's full live-test sweep:

1. Bump status header from 2026-05-06 to "re-run 2026-05-07" with a
   note that all 5 steps still pass against the live AVEVA install.
   The first run of step 1 + step 5 today failed with
   `Error::Status { detail: 5 }` (DCE/RPC fault 0x00000005) traced
   to MX_TEST_DOMAIN being polluted with the infisical CLI's
   "A new release of infisical is available" upgrade banner. The
   banner was being concatenated onto the domain string by
   Setup-LiveProbeEnv.ps1's `2>&1` capture, causing NTLM Type1 to
   send a malformed domain field that NmxSvc rejected.

2. Fix tools/Setup-LiveProbeEnv.ps1 — Get-InfisicalSecret now splits
   captured output on newlines, filters lines matching the
   "^A new release of infisical is available" / "^Please upgrade"
   banner patterns, and returns the last non-empty line (the actual
   secret value from `infisical secrets get --plain`). Robust to
   future banner messages of similar shape.

3. Fix two drifted line citations in docs/M6-live-verification.md:
   `recover_connection_core (session.rs:1428-...)` is now at line
   1374 after F56/F45/F47 edits — strip the line number, keep the
   function name (`Session::recover_connection_core`). Same for
   `Session::unsubscribe (session.rs:2261)`.

4. Add "Workspace gate (no live infra needed)" subsection to the
   "Reproducing locally" recipe so a fresh contributor sees the
   full V1 verification recipe (live + workspace gate) in one place.

All 5 live tests pass post-fix:
  - F36 buffered subscribe (drained 1 raw NMX message; no scan
    activity on TestChangingInt today, matches 5/6 baseline)
  - F45 buffered recovery replay (2 pre + 2 post DataUpdate frames)
  - F47 buffered unsubscribe skip (returned Ok)
  - F40 metrics smoke (4 expected metric names present)
  - F54 OnWriteComplete (status detail 9 = WRITE_COMPLETE_OK)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 09:17:46 -04:00
Joseph Doherty d668d5b7b1 mxaccess: fix 9 unit tests broken silently by F56's ensure_publisher_connected
Workspace gate sweep flagged 9 unit tests in mxaccess::session that
had been silently failing since F56 landed (commit 5e11b30). Root
cause: F56 added ensure_publisher_connected (issuing
INmxService2::Connect + AddSubscriberEngine before each
AdviseSupervisory) but the in-process fake-NMX-server fixtures'
responses vec sizes weren't bumped. Once the fake server ran out of
responses mid-handshake, the connection was closed and the client
got ConnectionAborted (10053).

Fix: bumped each test's unauthenticated_server / recording_server
response count by 2 to cover the new pair of RPCs. Tests touched:

  - subscribe_then_unsubscribe_round_trip (2 → 4 responses)
  - two_subscribes_produce_distinct_correlation_ids (4 → 6)
  - subscription_stream_yields_data_change_for_matching_correlation (1 → 3)
  - subscription_stream_filters_out_mismatched_correlation_for_status (1 → 3)
  - subscription_stream_keeps_data_update_regardless_of_correlation (1 → 3)
  - subscribe_populates_registry_unsubscribe_clears_it (2 → 4)
  - read_returns_first_data_change_within_timeout (2 → 4)
  - read_returns_timeout_when_no_data_arrives (2 → 4)
  - unsubscribe_skips_un_advise_for_buffered_subscription (2 → 3
    + mid-flow assertion bumped from len()==1 to len()==3)

The two_subscribes test only adds 2 (not 4) extra responses because
the second subscribe hits the per-engine publisher_endpoints cache.

Workspace gate post-fix: 847 tests pass, 0 failed, 9 ignored
(live-only). Clippy + bench clean. Pinned in
docs/M6-live-verification.md "Workspace gate (2026-05-07)" so the
test-fixture lag is recorded for future audits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 04:44:18 -04:00
Joseph Doherty 9ed4700eb4 docs: audit pass — fix stale F-number references
Walked all 18 docs/*.md for stale followup references and outdated
TODO markers. Two real fixes:

docs/M6-buffered-evidence.md:
- Three references to "F45" for the LMX-proxy Suspend/Activate
  Frida instrumentation were stale. That work was actually filed
  as F46 when the followups list got renumbered (F45 was reassigned
  to "Recovery replay should re-issue RegisterReference for
  buffered subscriptions"). F46 landed in commit 808fea1, and the
  follow-up live capture landed as F50 in commit 349e217.
- Updated all three references to point at F46 + F50 + the
  resolution evidence in docs/F50-suspend-activate-evidence.md.
- Renamed the "Sub-followup filed: F45" section to
  "Sub-followup F46 — RESOLVED 2026-05-06" with the verdict from
  the live capture.

docs/M6-live-verification.md:
- "Open work" section listed F50 as a residual gap. F50 closed
  2026-05-06 per docs/F50-suspend-activate-evidence.md. Updated
  to "None. F49 sweep complete; F50 closed".

Other docs scanned, no real staleness:
- Capture-Run-2026-04-25.md, Current-Sprint-State.md,
  DotNet10-Native-Library-Plan.md — historical snapshot docs,
  intentionally pinned to their dates.
- ASB-Native-Integration-Decision.md, MxNativeSession-API.md,
  NMX-COM-Contracts.md, MXAccess-* — describe the .NET reference's
  state; "not yet" wording reflects the .NET planning context, not
  the Rust port's current state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 04:32:28 -04:00
Joseph Doherty 8b50c0fd43 CHANGELOG: curate post-F43 work into V1 entry
The CHANGELOG was cut at F43 and didn't reflect the work that landed
afterwards on the same V1 milestone. Update the V1 [Unreleased] entry
to cover:

Added (since F43):
- F45 — recovery replay re-issues RegisterReference for buffered subs
- F47 — unsubscribe skips UnAdvise for buffered subs
- F49 / F50 / F51 — live verification + Suspend/Activate captures +
  ASB type-matrix expansion with new fixture round-trip tests
- F52.{1,2,3} — codec performance optimisations (BytesMut output,
  thread-local name-signature cache, caller-supplied scratch buffer)
- F54 — per-operation correlation + compat OnWriteComplete fan-out
- F55 — DCOM-managed INmxSvcCallback sink (Path A)
- F56 — Connect/AddSubscriberEngine round-trip in subscribe path
- MxStatus synthesizer kernel ported (settles R3/R4)

Known limitations (post-resolution):
- Drop F45 / F46 / R3+R4 — all resolved.
- Add F53 protocol-crate missing-docs deferral.
- F3 entry now links the new docs/F3-cross-domain-ntlm-recipe.md.

Publish-order section keeps the DAG but flags F48 (no crates.io
publish) up front so anyone reading the recipe knows it's hygiene
not release prep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 04:27:59 -04:00
Joseph Doherty cc99a2d9f0 followups: trim F56's stale pre-resolution analysis
F56's body had a "Resolved 2026-05-06" header followed by ~40 lines
of pre-resolution debugging analysis that contradicted the
resolution: "Likely revised root cause" pointing at DCOM sink IID
mismatches, "But zero 0x33 DataUpdate frames ever arrive", "Action
items for whoever picks F56 up", "Definition of done", "Resolves
when" — all written before the actual root cause (missing
EnsurePublisherConnected round-trip) was identified.

Trim to: status + actual root cause + fix that landed + live
verification + the codec fixes that also landed independently.
The dead-end debugging branches are preserved in this file's git
history for archeology; F56 body now reads as a coherent closeout.

Also fixed line 108's "See Resolved section below for the full
closeout" pointer — the closeout *is* the body; F56 was never moved
to Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 03:31:26 -04:00
Joseph Doherty ddebab2c2d docs: F3 cross-domain NTLM provisioning recipe
Self-contained doc at docs/F3-cross-domain-ntlm-recipe.md for whoever
picks F3 up on hardware with two AD forests + a forest trust. Covers:

- Lab topology (LAB-A resource forest with AVEVA install + LAB-B
  account forest with the probe user, bidirectional forest trust).
- DC + DNS + trust + user provisioning steps (Install-ADDSForest,
  Add-DnsServerConditionalForwarderZone, New-ADTrust, New-ADUser).
- Capture procedure for both the Rust and .NET probes under a
  `runas /netonly` cross-domain token, with Wireshark NTLMSSP guidance.
- Fixture layout under crates/mxaccess-rpc/tests/fixtures/cross-domain-ntlm/.
- Round-trip test skeleton (replay the captured Type 2 → regenerate
  Type 3 → assert byte-equality against the captured Type 3).
- Redaction checklist for the captured bytes.
- Why F3 is "evidence work" not "codec work" — the AV pair parser
  is shape-agnostic, so the codec path is already correct; the
  fixture is a regression net for any future drift.

F3 entry in design/followups.md and R8 in design/70-risks-and-open-questions.md
both now point at the recipe so a future contributor doesn't have
to reconstruct the lab topology from the followup analysis alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 02:40:06 -04:00
Joseph Doherty 73e2bd8771 followups: status snapshot for the Open section
After F52 closed, every entry in the Open section except F3 has a
`**Status:**` line documenting its own resolution (Resolved 2026-05-06,
or Out-of-scope). At a glance the section misleadingly looks like 8
live items.

Add a header snapshot calling out that only F3 — cross-domain NTLM
fixture, externally blocked on a second AD domain — is genuinely open.
The other entries stay where they are because the F-numbers in their
analysis are referenced from other followups; moving them to
`## Resolved` would orphan that context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:57:58 -04:00
Joseph Doherty ceeaeefa71 [F52.3] mxaccess-codec: caller-supplied scratch buffer for write encoder
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Adds `write_message::encode_into_bytes_mut` (and the timestamped
variant) which writes the encoded body into a caller-supplied
`BytesMut`. The buffer is cleared and resized in place each call;
once it has grown to the largest body the session will produce, it
allocates nothing further.

A session that holds a single `BytesMut` and reuses it across writes:

  - Int32 / Float32 / Float64: 2 → 1 allocs/op
    (only the `encode_scalar_value` scratch `Vec<u8>` remains)
  - Boolean: 1 → 0 allocs/op
    (no per-value scratch — the literal payload is a stack `[u8; 4]`)

Bench delta in `design/M6-bench-baseline.md` § F52.3. The
`encode_scalar_value` Vec is the remaining 1 alloc/op for fixed-width
scalars; eliminating it would require inlining the LE-bytes write
into the body slice (left for a follow-up since the F52 spec only
asks for 2 → 1).

Resolves F52 (all three optimisations landed: 4e76b44 F52.1,
a0fa5be F52.2, this commit F52.3). Existing `encode` / `encode_to_bytes_mut`
public surface unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:53:07 -04:00
Joseph Doherty a0fa5bedfd [F52.2] mxaccess-codec: thread-local name-signature cache
Adds a thread-local `HashMap<String, u16>` cache inside
`compute_name_signature`. Repeated calls with the same name (the hot
path inside `MxReferenceHandle::from_names`) skip the `to_lowercase`
allocation and the CRC-16/IBM walk entirely. Bounded at 1024 entries
per thread; on overflow the cache is cleared rather than evicted LRU
— any sane workload re-fills only the names it actively uses.

`MxReferenceHandle::from_names` drops from 2 → 0 allocs/op once warm
(bench delta in `design/M6-bench-baseline.md` § F52.2). Cold-path
behaviour is unchanged: first call with a new name still pays the
`to_lowercase` + cache-key `String` allocations.

Two new tests pin the cache: cache-hit returns the same value as
cold-compute, and cache overflow doesn't break correctness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:50:07 -04:00
Joseph Doherty 4e76b44391 [F52.1] mxaccess-codec: BytesMut output buffer for write encoder
Adds `write_message::encode_to_bytes_mut` (and the timestamped variant)
returning a freshly-allocated `BytesMut`. Allocation count is identical
to `encode` (2 allocs/op for fixed-width scalars); the benefit is
downstream — consumers can `BytesMut::split_to` / `freeze` and forward
the body bytes to a wire-level sink without an intermediate copy.

The body builders (`encode_boolean` / `encode_fixed` / `encode_variable`
/ `encode_array`) were refactored to fill a pre-sized `&mut [u8]`
rather than each allocating their own `Vec<u8>`. The dispatcher
computes the body size up front via small `*_body_size` helpers and
resizes the destination buffer (Vec or BytesMut) once. This is also
the prerequisite refactor for F52.3.

Bench delta in `design/M6-bench-baseline.md` § F52.1; existing
`encode` row unchanged at 2 allocs/op. All 265 round-trip tests
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:46:02 -04:00
Joseph Doherty c7505f9570 [F51] live ASB type-matrix: provision UDAs + capture wire fixtures + round-trip tests
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Provisioned 7 new UDAs on $TestMachine via wwtools/graccesscli
object uda add (then deployed to TestMachine_001):

  TestFloat          MxFloat        scalar
  TestFloatArray     MxFloat        array (4)
  TestDouble         MxDouble       scalar
  TestDoubleArray    MxDouble       array (4)
  TestDateTime       MxTime         scalar
  TestDuration       MxElapsedTime  scalar
  TestDurationArray  MxElapsedTime  array (4)

New crates/mxaccess/examples/asb-type-matrix.rs reads all 14 tags
(7 pre-existing + 7 new) in a single batch and dumps the live
AsbVariant bytes per tag when MX_ASB_DUMP_FIXTURES=<dir> is set.
Single-attempt register (no retry — F31 InvalidConnectionId
cool-down re-arms on every retry, making backoff
counter-productive; if the cool-down is engaged, wait 60+ seconds
without ASB activity then re-run).

Captured live evidence (single cold-start run, all 14 register
calls returned error_code=0x0000):

  TestChangingInt   type_id=4  (Int32)        length=4   payload=4
  TestAlarm001      type_id=17 (Boolean)      length=1   payload=1
  MachineCode       type_id=10 (String)       length=30  payload=30
  TestFloat         type_id=8  (Float)        length=4   payload=4
  TestDouble        type_id=9  (Double)       length=8   payload=8
  TestDateTime      type_id=11 (DateTime)     length=8   payload=8
  TestDuration      type_id=12 (ElapsedTime)  length=8   payload=8

  TestIntArray, TestBoolArray, TestStringArray, TestDateTimeArray,
  TestFloatArray, TestDoubleArray, TestDurationArray
                    type_id=0 length=0 payload=0
                    (provisioned but no value written yet)

Per-tag fixture .bin files saved under
crates/mxaccess-codec/tests/fixtures/f51-type-matrix/ — full
14-byte to 40-byte AsbVariant byte sequences (i32 type_id LE +
i32 length LE + payload bytes).

crates/mxaccess-codec/tests/f51_type_matrix_parity.rs round-trips
each scalar fixture: decode -> re-encode -> assert byte-equal +
type_id / length pin. Tests skip with [skip] message when fixtures
are absent (so the suite passes on a fresh checkout without live
captures). 7 scalar tests pass against the captured fixtures.

Array tags excluded from round-trip pinning because the live
engine returns empty payloads for unwritten arrays. Codec-side
array round-trip is covered by asb_variant's existing synthetic-
payload unit tests.

docs/galaxy-test-fixtures.md inventories all $TestMachine UDAs
(pre-existing + F51-provisioned), the graccesscli provisioning
recipe, the fixture-regeneration pattern, and the F31 cool-down
caveat.

design/followups.md F51 marked resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 15:27:31 -04:00
Joseph Doherty 8bd66bbe65 [F53 measurement] document protocol-crate missing-docs magnitude
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Enabled #![warn(missing_docs)] on each of the 7 protocol crates to
measure how many one-liners filling them in would be:

  mxaccess-asb         422
  mxaccess-nmx         398
  mxaccess-callback    371
  mxaccess-galaxy      229
  mxaccess-codec       205
  mxaccess-rpc         147
  mxaccess-asb-nettcp  111
                     -----
  Total              1883

Reverted the lint enables — most of those are protocol-internal
types (struct fields on wire-shape records, enum variants on opcode
discriminators) whose meaning is already documented at the
consumer-facing layer. Filling 1883 one-liners adds noise without
consumer value, and forcing them as errors via RUSTDOCFLAGS would
block routine cargo doc runs.

design/followups.md F53 entry updated with the measured numbers
and the explicit "stays off indefinitely" verdict. If a future
contributor wants per-crate enforcement, the recipe in the strategy
paragraph (allow(missing_docs) on protocol-internal modules,
warn(missing_docs) on the re-export surface) is still valid.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 13:36:03 -04:00
Joseph Doherty 349e217ea3 [F50] live Suspend/Activate captures — Suspend wires opcode 0x2D, Activate client-side
Re-ran analysis/frida/mx-nmx-trace.js (with the F46 hooks for
LmxProxy.dll!CLMXProxyServer.Suspend / .Activate) against
MxTraceHarness on the local AVEVA install. Two captures landed:

- captures/123-frida-suspend-advised-instrumented/
  Scenario: --scenario=suspend-advised --tag=TestChildObject.ScanState

  After mx.suspend.begin/end at 17:23:51.949Z, NMX PutRequest fires
  ~140ms later with body:
    2d 01 00                                       command 0x2D, version 0x0001
    cd 2a ee ec b2 76 06 4f b4 58 5c a0 2d f7 a8 93 16-byte correlation_id (matches the prior AdviseSupervisory)
    01 00 05 00 01 00 02 00 01 00 69 00 0a 00      engine + handle + attribute / property ids
    47 92 00 00 03 00 00 00                        trailer

  TransferData wraps it; HRESULT 0 returned; ProcessDataReceived
  callback delivers a 50-byte op-status frame; LMX surfaces it
  through CUserConnectionCallback.OperationComplete. Suspend is
  unambiguously server-side wire op 0x2D.

- captures/124-frida-activate-advised-instrumented/
  Scenario: --scenario=activate-advised --tag=TestChildObject.ScanState

  Activate fires at 17:26:02.982Z and returns Success synchronously
  with no NMX traffic. The next NMX activity is 7+ seconds later
  (harness teardown). Activate against a non-suspended item is
  client-side only on this build.

The harness's activate-advised scenario doesn't sequence
Suspend-then-Activate, so we don't have direct evidence for
Activate-after-Suspend. Circumstantial reasoning: since Suspend
goes server-side with a state change, Activate likely also does to
revert. If direct evidence becomes needed, add a new
suspend-then-activate scenario to MxTraceHarness/Program.cs and
re-run.

design/70-risks-and-open-questions.md R5 moves to "settled —
Suspend is wire op 0x2D, Activate behaviour is conditional",
severity downgraded P2 -> P3 (no public Session::suspend /
Session::activate API exists today; if added later, 0x2D is the
encoder target).

design/followups.md F50 marked resolved.

docs/F50-suspend-activate-evidence.md: per-capture byte-level
evidence + repro recipe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 13:29:40 -04:00
Joseph Doherty b62ffc8c5d [F48] mark out-of-scope: internal usage only, no crates.io publish
Maintainer confirmed 2026-05-06 the project is internal-use only —
workspace stays at version "0.0.0", consumers depend via path or
git, not crates.io. F48's actual publish goal is dropped.

design/followups.md F48 entry: replace the "P1 release driver"
framing with "Out of scope" + a pointer to the recipe doc in case
this ever changes.

design/F48-publish-dry-run.md: add a banner at the top explaining
the doc is now retained as a workspace-hygiene record (cargo
package --list per crate produces clean tarballs, no captures or
big files), not as release prep. The "What the actual V1 publish
needs" section reframed as "If a publish ever does become a goal —
recipe" so the steps survive without implying they're scheduled.

No code change. F49 / F53 / F55 / F56 status unchanged — those
weren't release-cut-gated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 13:13:24 -04:00
Joseph Doherty e77db4306a [F48 dry-run] validate publish chain on workspace 0.0.0
cargo publish --dry-run on each of the 9 workspace crates:
- Tier 1 leaves (mxaccess-codec, mxaccess-rpc, mxaccess-asb-nettcp)
  pass cleanly. cargo assembles each tarball, the only failure is
  the dry-run upload abort.
- Tiers 2 + 3 (galaxy, callback, asb, nmx, mxaccess, mxaccess-compat)
  surface the documented "no matching package" registry-lookup
  failure because workspace internal deps are pinned at version
  "0.0.0" which doesn't exist on crates.io. Expected; resolves at
  actual publish time once the leaves are uploaded and indexed.

cargo package --list confirms each crate ships only source + tests
+ small round-trip fixtures. No captures, decompiled binaries, or
accidental big files.

design/F48-publish-dry-run.md captures the per-crate run output,
the per-crate file count, and the V1 publish recipe (bump 0.0.0
→ 0.1.0 across workspace + internal-dep pins, publish in tier
order, wait for indexing between tiers, tag).

design/followups.md F48 entry annotated with the dry-run status.
The actual publish to crates.io is deliberately not done — that
needs maintainer auth + a deliberate version bump that's a release-
cut decision, not a routine validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:42:22 -04:00
Joseph Doherty c606736ec3 [F53 partial] enable #![warn(missing_docs)] on consumer crates
mxaccess + mxaccess-compat now carry #![warn(missing_docs)] at the
crate root. Every public item has at least a one-line doc comment
(struct fields, enum variants, trait methods all covered).

Touched items:
- mxaccess::lib: DataChange fields, SecurityContext fields,
  TransportKind variants, TransportCapabilities fields,
  RecoveryEvent variants + their inner fields, SessionOptions
  fields, the full Error / ConnectionError / AuthError /
  ProtocolError / ConfigError / SecurityError taxonomy + nested
  fields, Transport trait method docs.
- mxaccess-compat::lib: DataChangeEvent / BufferedDataChangeEvent /
  WriteCompleteEvent / OperationCompleteEvent fields.

Protocol crates (codec, rpc, galaxy, nmx, callback, asb,
asb-nettcp) deliberately left without the lint per F53's strategy
paragraph — their consumers (mxaccess + mxaccess-compat) already
document the surfaces they re-export, and forcing one-liners on
every transport-internal item adds noise without consumer value.

Verification:
- `RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps` clean.
- `cargo test --workspace` (824 tests) green.
- `cargo clippy --workspace --all-targets -- -D warnings` clean.

design/followups.md F53 marked partially resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:20:47 -04:00
Joseph Doherty d149143535 [F49 steps 2 + 3] live verification: buffered recovery replay + unsubscribe skip
Step 3 (F47 buffered unsubscribe skip):
- crates/mxaccess-compat/tests/buffered_unsubscribe_skip_live.rs.
- Subscribe buffered, sleep so the engine has DataUpdates in flight,
  then call unsubscribe. Asserts Ok return without surfacing transport
  or HRESULT errors.
- Session::unsubscribe (session.rs:2261) probes the registry: if
  Buffered { .. }, it skips nmx.un_advise entirely, mirroring the .NET
  reference's `if (!subscription.IsBuffered)` guard at
  MxNativeSession.cs:361-381. If unsubscribe accidentally emitted
  UnAdvise for a buffered correlation id, the engine would return
  non-zero HRESULT (no matching plain advise to retract) — surfacing
  as a panic.

Step 2 (F45 buffered recovery replay):
- crates/mxaccess-compat/tests/buffered_recovery_replay_live.rs.
- Subscribe buffered, drain >=1 NMX subscription message
  (cmd=0x32 SubscriptionStatus + cmd=0x33 DataUpdate) to confirm the
  wire path is hot pre-recovery, install a RebuildFactory that calls
  NmxClient::create (the same auto-resolving COM-activation path
  Session::connect_nmx_auto uses), invoke recover_connection, drain
  >=1 NMX subscription message post-recovery.
- Verifies the replay branch in recover_connection_core re-issues
  RegisterReference (NOT AdviseSupervisory) for the buffered entry,
  mirroring MxNativeSession.ReAdviseSubscription (cs:538-569).
  Structural property is unit-tested; this confirms the engine
  actually picks back up after the rebuild + replay.

Both tests pass live on this Galaxy:
  cargo test -p mxaccess-compat --features live-windows-com \
      --test buffered_unsubscribe_skip_live -- --ignored --nocapture
  cargo test -p mxaccess-compat --features live-windows-com \
      --test buffered_recovery_replay_live -- --ignored --nocapture

Pulls mxaccess-nmx + mxaccess-codec into mxaccess-compat dev-deps so
the recovery test can build a RebuildFactory closure that returns
NmxClient and bind a typed broadcast Receiver.

design/followups.md F49 -> Resolved (all five steps pass live).
docs/M6-live-verification.md updated with per-step evidence + repro
commands.

F49 is fully closed out. F55 (DCOM-managed INmxSvcCallback, Path A)
and F56 (missing EnsurePublisherConnected + post-RegisterReference
AdviseSupervisory for buffered) were the two real Rust-port bugs
uncovered along the way; both resolved. Remaining post-V1 followups
(F50 Suspend/Activate Frida, F51 ASB type matrix, F52 perf, F53 doc
lint, etc.) are scoped independently and not part of F49.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:00:44 -04:00
Joseph Doherty 5e11b30507 [F56 resolved] subscribe paths now drive 0x33 DataUpdate frames
Root cause: `Session::subscribe` and `Session::subscribe_buffered_nmx`
were missing the `INmxService2::Connect` + `AddSubscriberEngine` RPC
pair that the .NET reference's `MxNativeSession.EnsurePublisherConnected`
(`cs:516-526`) issues before the first advise against a publishing
engine. Without those two RPCs, NmxSvc accepted the subscription
registration but the publishing engine never knew our engine was
subscribed — so it never dispatched DataUpdate frames back.

Diagnosis driven by wwtools/aalogcli reading
C:\ProgramData\ArchestrA\LogFiles. The user pointed at this tooling
which lit up the path.

Red herring: NmxSvc's `[Warning] NmxCallback->DataReceived ... failed
with error 0x{N}` log lines turned out to be normal log spam where N
is the bufferSize of the inbound call, not a real error code. The
.NET reference's own probe triggers identical entries while still
receiving DataUpdate frames successfully.

Fix:
- SessionInner::publisher_endpoints — per-session HashMap<(platform_id,
  engine_id), ()> cache mirroring MxNativeSession._publisherEndpoints.
- Session::ensure_publisher_connected — issues Connect +
  AddSubscriberEngine, once per publisher endpoint per session.
- Session::subscribe + subscribe_buffered_nmx — both call it before
  the wire advise.
- subscribe_buffered_nmx — additionally issues AdviseSupervisory after
  RegisterReference. The .NET reference's RegisterBufferedItemAsync
  only calls RegisterReference, but on this AVEVA install
  RegisterReference alone produces the registration result + heartbeat
  callbacks without ever starting DataUpdate dispatch; AdviseSupervisory
  unblocks the dispatch.

Live verification (`TestMachine_001.TestChangingInt`, a tag that
updates >1×/s):
  cargo test -p mxaccess-compat --features live-windows-com \
      --test plain_subscribe_live -- --ignored --nocapture
  cargo test -p mxaccess-compat --features live-windows-com \
      --test buffered_subscribe_live -- --ignored --nocapture
Both pass — `cmd=0x32` SubscriptionStatus + sequence of `cmd=0x33`
DataUpdate frames flow as expected. Tests assert on the raw
Session::callbacks() broadcast (not the typed Subscription::next
DataChange path) because the engine reports quality=Uncertain
value=null for this attribute on this Galaxy — the wire-level
subscription is what F56 was about, not the value content.

DcomCallbackSink reverted to S_OK return for both DataReceivedRaw
and StatusReceivedRaw (the bytes-processed / sentinel HRESULT
experiments during diagnosis turned out to be irrelevant — the
"failed with error 0xN" logs come from NmxSvc regardless of the
return value).

design/followups.md F49 + F56 + docs/M6-live-verification.md updated:
F56 resolved, F49 steps 1 + 4 + 5 pass live, steps 2 + 3 pending
(now executable on this fixture).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:32:07 -04:00
Joseph Doherty c6332c26a1 [F49 step 4 + step 5 + doc] live evidence: metrics smoke pass, M6-live-verification.md
F49 step 4 (F40 metrics smoke):
- crates/mxaccess-compat/tests/metrics_smoke_live.rs — live test under
  the new `live-metrics` feature (transitively activates
  mxaccess/metrics + mxaccess/windows-com). Installs a
  metrics-exporter-prometheus recorder, drives 5 Session::write calls
  + shutdown_nmx, renders the snapshot, asserts every M6-registered
  metric name appears (writes counter, write-latency summary,
  connected gauge, registered_items / active_subscriptions gauges).
  Pass on the live AVEVA install.

  Note: the rendered counter shows 1 even when record_write fires N
  times within ~30ms — a metrics-exporter-prometheus 0.16 quirk under
  tight loops, not a Rust port bug. Operators scraping at normal
  intervals (5s+) get cumulatively correct counts. Documented in the
  test + in M6-live-verification.md so future runs aren't surprised.

F49 status update (in design/followups.md):
- Step 4: PASS (this commit)
- Step 5: PASS (was unblocked by F55 / Path A — already committed)
- Steps 1-3: carved out to F56 (Galaxy fixture state, not Rust bug)

docs/M6-live-verification.md:
- Per-step evidence table with test invocations + outcomes.
- Sample Prometheus snapshot for step 4.
- Reproduction commands for the live tests.
- F56 explanation cross-referenced from step 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 10:36:09 -04:00
Joseph Doherty df3457c54a [F56] subscribe / subscribe_buffered: split-form wire body + diagnose Galaxy fixture gap
Three real fixes + one architectural diagnosis:

1. Session::subscribe_buffered_nmx now sends the .NET-reference split
   form on the wire:
     item_definition = "<attr>.property(buffer)"   (was: full reference)
     item_context    = "<object_tag_name>"          (was: empty)
     item_handle     = SessionInner::next_item_handle.fetch_add(1)
                       (was: hardcoded 0)
   Verified byte-identical against captures/082 + 094 by the existing
   buffered_register_reference_parity unit tests. The
   item_handle counter mirrors MxNativeCompatibilityServer's
   _nextItemHandle++ at MxNativeSession.cs:613.

2. New live tests:
   - tests/buffered_subscribe_live.rs (F49 step 1) — uses real Galaxy
     metadata via SqlTagResolver + connect_nmx_auto, drives a
     background writer at 500ms cadence to force value-changes,
     drains DataChange events from Subscription.
   - tests/plain_subscribe_live.rs — same harness over plain
     Session::subscribe (NOT buffered), used to isolate whether
     "no DataUpdate" is buffered-specific (it's not — both fail).

   Both pull tracing-subscriber as a dev-dep so `RUST_LOG=trace`
   surfaces dcom_sink + router activity.

3. mxaccess-galaxy/sql_resolver.rs: drop the inner-attribute
   `#![cfg(feature = "galaxy-resolver")]` — the module-level cfg on
   `pub mod sql_resolver` in lib.rs already handles this and Rust
   1.85's clippy::duplicated_attributes lint flagged the duplicate
   once mxaccess-compat dev-deps activated the feature.

4. F56 finding (diagnosis, NOT a bug fix): the engine on this Galaxy
   install does not have an active value for TestChildObject.TestInt.
   Confirmed by running the .NET reference's own probe:

     dotnet run --project src/MxNativeClient.Probe -c Release \
       -- --probe-session-subscribe --tag=TestChildObject.TestInt \
       --subscribe-hold-seconds=10

   ...returns ONE 0x32 SubscriptionStatus (status=3 detail=3
   quality=0x00C0 Uncertain value=null) and zero 0x33 DataUpdates —
   matching the Rust port's symptom exactly. Not a Rust port bug,
   not a wire-byte gap. F49 steps 1-3 need either an actively-
   scanned tag or local Galaxy reconfiguration to scan
   TestChildObject.TestInt.

Workspace tests + clippy clean under both feature configurations.
F56 entry in design/followups.md updated with the full diagnostic
chain so future-me / future-collaborators can pick it up without
re-tracing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 10:27:08 -04:00
Joseph Doherty af15fe7587 [F49 step 1 + F56] callback router: peel envelope before parsing subscription / 0x11 frames
The router used to call NmxSubscriptionMessage::parse_inner directly
on the COM-stub-delivered body, but the wire bytes arrive wrapped in
a ProcessDataReceived envelope (46-byte header + optional 4-byte
length prefix); parse_inner expects post-envelope bytes. Result:
every 0x33 DataUpdate that ever arrived was silently dropped.

Mirrors the .NET reference's MxNativeSession.OnCallbackReceived flow
at cs:582-606 — three sequential parse attempts:
  1. NmxOperationStatusMessage::try_parse_process_data_received_body  (already wired)
  2. NmxReferenceRegistrationResultMessage::try_parse_...              (NEW — was missing)
  3. NmxSubscriptionMessage::try_parse_process_data_received_body      (NEW — was wrong)

Adds:
- NmxSubscriptionMessage::try_parse_process_data_received_body — peels
  envelope via NmxObservedEnvelope::parse_process_data_received_body_flexible,
  then dispatches to existing parse_inner.
- NmxReferenceRegistrationResultMessage::try_parse_process_data_received_body —
  same shape, for the 0x11 registration-result frame.
- Router branch for 0x11 — currently traces the assigned item_handle and
  drops the frame (matches the .NET reference, which fires a
  ReferenceRegistrationReceived event with no consumer in the codebase).
- Router fall-through trace! when neither path matches, so future
  unparseable bodies surface in RUST_LOG=trace instead of vanishing.
- DcomCallbackSink::forward — trace! per inbound callback so
  RUST_LOG=mxaccess_callback=trace surfaces opnum + size.
- crates/mxaccess-compat/tests/buffered_subscribe_live.rs — F49 step 1
  live test that drives subscribe_buffered + a 500ms-cadence writer.
  Also pulls tracing-subscriber as a dev-dep so the test can dump
  router activity.

Existing router_task_decodes_callback_invoked_into_broadcast unit test
updated to wrap its synthetic 0x32 body in an envelope so the new
parse path actually accepts it.

Live result: F56 — the buffered round-trip *registers* successfully
(RegisterReference returns HRESULT 0; engine sends one 0x11
RegistrationResult + one 51-byte op-status per write, perfectly
clocked) but the engine never sends a 0x33 DataUpdate. Rust-port-
specific gap vs the .NET reference's working buffered path; root
cause is likely a field-level difference in the RegisterReference
body or a missing post-RegisterReference step. Captured as F56 in
design/followups.md, blocking F49 step 1; F56's DoD is the same
live test reporting >=3 DataChange arrivals.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 09:50:57 -04:00
Joseph Doherty 2fc327a8d5 [F55 Path A] DCOM-managed INmxSvcCallback sink
Replace the hand-rolled CallbackExporter (TCP listener + custom
OBJREF) with a real `windows-rs` `#[implement]` COM class for
INmxSvcCallback, marshalled via CoMarshalInterface. NmxSvc validates
the callback OBJREF by calling IObjectExporter::ResolveOxid against
the local RPCSS at 127.0.0.1:135; hand-rolled OXIDs aren't registered
there, which is why RegisterEngine2 returned RPC_S_SERVER_UNAVAILABLE
(1722) on every live attempt. CoMarshalInterface registers the OXID
with RPCSS automatically, so the SCM-side resolution succeeds.

Mirrors MxNativeSession.CreateRegisteredService (cs:624), which is
the .NET reference's working path:
  ComObjRefProvider.MarshalInterfaceObjRef(callback,
    INmxSvcCallback, DifferentMachine)

Layout:
- mxaccess-callback::dcom_sink — INmxSvcCallback + DcomCallbackSink
  + create_dcom_callback_sink_objref. Forwards inbound calls into
  the same CallbackEvent::CallbackInvoked { opnum, body } shape the
  legacy exporter produces, so callback_router stays path-agnostic.
- Session::from_nmx_client — branched on `windows-com`. Real DCOM
  sink when on; legacy CallbackExporter when off (kept for unit
  tests that run against an in-process fake NMX peer).
- SessionInner.dcom_sink_holder: Option<IUnknownHolder> — keeps the
  COM ref alive for the session's lifetime; shutdown_nmx drops it.
- mxaccess-rpc + mxaccess-callback: windows-rs 0.59 → 0.62. The 0.59
  #[implement] macro generates code that doesn't compile under
  edition 2024; 0.62 is fixed.

Live result: cargo test -p mxaccess-compat --features
live-windows-com --test lmx_write_complete_live -- --ignored
--nocapture passes end-to-end. RegisterEngine2 OK, write
round-trips, OnWriteComplete fires with the captured MxStatus shape.

Unblocks F49 step 5; F55 marked Resolved in design/followups.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 09:25:44 -04:00
Joseph Doherty 0a274af76f [F55] Path C investigation: NmxSvc requires SCM-registered OXID for callbacks
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Captured OBJREF byte structures from both paths via the .NET probe:
- `--probe-callback-marshal`: DCOM-marshalled, 338 bytes, succeeds
  (when used inside `MxNativeSession.Open` → `CreateRegisteredService`).
- `--probe-register-managed-callback`: hand-rolled, 162 bytes, fails
  with `RegisterEngine2 → 0x800706BA RPC_S_SERVER_UNAVAILABLE`.

The structural diff:
- `std_flags`: DCOM=`0x0A80` (SORF_OXRES4+6+8) vs hand-rolled=`0x280`
  (SORF_OXRES4+6). Bit `0x0800` (SORF_OXRES8) only set in DCOM.
- ncacn_ip_tcp bindings: DCOM=4 with no ports; hand-rolled=1 with
  explicit `[port]`.
- Total size: 338 vs 162 bytes.

Tested the simplest fix (hand-rolled `std_flags = 0x0A80` to match
DCOM): **still fails with the same 1722.** Reverted.

**Diagnosis updated in F55:** NmxSvc on receiving RegisterEngine2
appears to call `IObjectExporter::ResolveOxid` against the local
SCM (`127.0.0.1:135`) to resolve the callback OBJREF's OXID, then
dial the resulting bindings. Our hand-rolled OXID is never
registered with RPCSS, so the SCM-side resolution fails and NmxSvc
returns RPC_S_SERVER_UNAVAILABLE — matching:
- the symptom (1722),
- the sub-second timing (no TCP dial-back to our listener attempted),
- the fact that the .NET `ManagedCallbackExporter` (same hand-rolled
  approach) ALSO fails identically.

DCOM marshalling fixes this because `CoMarshalInterface` internally
registers the OXID with RPCSS. The bindings have no port because
RPCSS returns the dynamic port from the DCOM stub layer.

**Conclusion: Path A is the architecturally correct fix** — the
callback exporter must be a DCOM-managed object (e.g. via
`windows-rs` `#[implement]`) for NmxSvc to accept the callback.
The hand-rolled-listener-with-explicit-port approach is
fundamentally incompatible with NmxSvc's callback validation, in
both Rust and the .NET reference.

Path C (cheap investigation) is exhausted; F55 verdict updated to
recommend Path A explicitly.

`cargo test --workspace` 824 passing; clippy `-D warnings` clean
across both feature configurations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 08:55:59 -04:00
Joseph Doherty c5d611d6fa [F12 partial + F55] hold IUnknown for client lifetime + diagnose RegisterEngine2 1722
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
**F12 partial improvement** (`mxaccess-rpc::IUnknownHolder` + `mxaccess-nmx`):

- New `IUnknownHolder` newtype that owns an MTA-resident COM proxy
  with `unsafe impl Send + Sync`. Mirrors the .NET reference's
  `ManagedNmxService2Client._activatedComObject` private field
  (`cs:15`).
- New `activate_and_marshal_iunknown_objref(prog_id, ctx)` returns
  `(Vec<u8>, IUnknownHolder)`. Existing
  `marshal_activated_iunknown_objref` retained as a wrapper that
  drops the holder (kept for inline-use callers).
- `NmxClient` gains an `activated_com_object: Option<IUnknownHolder>`
  field, populated by `Self::create` from the new helper.
  `Self::connect` / `Self::from_bound_transport` set it `None` (no
  COM activation in those paths).
- Holding the IUnknown for the client's lifetime keeps the
  SCM-tracked OXID valid; without it the COM ref count drops to
  zero and the SCM may release the activated server-side instance,
  making subsequent `ResolveOxid` / `RemQueryInterface` calls
  return `RPC_S_SERVER_UNAVAILABLE`.

**F55 (new) — hand-rolled callback exporter rejected by RegisterEngine2**

Five-step instrumentation of `Session::connect_nmx_auto` proves all
six COM-activation / RemQI / final-bind steps succeed. The 1722
fault originates at `RegisterEngine2` itself:

```
from_nmx_client: callback hostname="DESKTOP-6JL3KKO" port=57886 obj_ref_len=162
from_nmx_client: callback obj_ref hex: 4d454f57010000...
from_nmx_client: RegisterEngine2 (31112, mxaccess.31112)
from_nmx_client: RegisterEngine2 FAIL: Transport(Fault { status: 2147944122 })
```

Status `0x800706BA` = `RPC_S_SERVER_UNAVAILABLE` wrapped as Win32
HRESULT.

**Critical finding: the .NET reference's `--probe-register-managed-callback`
(which uses the same hand-rolled `ManagedCallbackExporter` approach
as the Rust port) ALSO fails with the same `0x800706BA` fault.**
Only `--probe-session-write`, which uses
`ComObjRefProvider.MarshalInterfaceObjRef(callback, ...)` to build
the OBJREF via Windows DCOM proxy/stub marshalling, succeeds. So
this is an architectural artifact of the hand-rolled-callback
design, not a Rust port regression.

`design/followups.md` F55 entry documents the three resolution
paths (switch to DCOM-marshalled callback / hybrid / continue
investigating OBJREF rejection at NmxSvc).

F49 stays open with a refined diagnostic — the per-feature live
verification is gated on F55's resolution.

Workspace tests still 824 passing; clippy `-D warnings` clean
across both feature configurations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 08:50:30 -04:00
Joseph Doherty e5b31fadb1 [F49] live-test scaffolding for F54 OnWriteComplete + COM probe diagnostic
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Live attempt against AVEVA on this dev host produced two artefacts:

**`crates/mxaccess-compat/tests/lmx_write_complete_live.rs`** — the
F54 OnWriteComplete round-trip test. Compiles + runs against the
live AVEVA install via either path:
- `--features live-windows-com` (preferred): uses
  `Session::connect_nmx_auto` so the COM activation reference is
  held in-process for the duration of the test.
- Default features (fallback): shells out to
  `MxNativeClient.Probe --probe-resolve-oxid-managed-ntlm-integrity`
  + `--probe-remqi-managed` to learn the per-session NMX endpoint +
  INmxService2 IPID, then uses `Session::connect_nmx`.

Both code paths are wired and the test runs through endpoint
resolution + IPID extraction successfully. The connect step itself
fails with `Status { detail: 1722 }` (RPC_S_SERVER_UNAVAILABLE).

**`crates/mxaccess-rpc/examples/com-marshal-probe.rs`** — minimal
one-shot binary that calls
`marshal_activated_iunknown_objref("NmxSvc.NmxService",
DifferentMachine)` in isolation. Confirms the COM activation +
CoMarshalInterface chain works fine standalone (returns a 338-byte
OBJREF with valid OXID/IPID structure). The 1722 in the live test
is therefore downstream of the activation — likely a COM-apartment
threading interaction with the tokio multi-thread runtime.

This is an F12-related issue (auto-resolve hardening), not an F54
issue. F54's correctness is covered by the existing unit-level
integration tests:
- `mxaccess::session::tests::router_populates_operation_status_context_from_pending_ops_fifo`
- `mxaccess::session::tests::write_handle_correlates_with_router_emitted_status`
- `mxaccess_compat::tests::drain_routes_write_status_to_on_write_complete`
- `mxaccess_compat::tests::drain_routes_non_write_status_to_on_operation_complete`

`design/followups.md` F49 entry updated to reflect:
- F54 added as a fifth row in the live-verification scope.
- "Live attempt 2026-05-06" sub-section documents the 1722 issue +
  what was verified (.NET probe end-to-end works against same
  install; Rust COM activation works in isolation; the failure is
  Rust-port-specific to `connect_nmx_auto` under tokio).
- F49 now Blocked-by F12 hardening (the 1722 path).

New `live-windows-com` feature on `mxaccess-compat` propagates to
`mxaccess/windows-com` for the test binary.

Workspace 824 → 824 tests; clippy + rustdoc clean across both
feature configurations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 08:23:01 -04:00
Joseph Doherty 04c10babfb [F54 test] end-to-end smoke: write_with_handle ↔ callback_router boundary
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Adds `write_handle_correlates_with_router_emitted_status` — the
closest-to-live test we can write without an AVEVA endpoint, pinning
the F54 boundary the C# `OnWriteComplete` callback ultimately depends
on.

The existing tests cover the layers individually:
- `write_value_with_handle_inserts_into_pending_ops` — write API
  populates pending_ops with the right correlation id.
- `router_populates_operation_status_context_from_pending_ops_fifo`
  — callback_router consumes a frame + the registry, emits a typed
  OperationStatus with context attached.
- `drain_routes_write_status_to_on_write_complete` (mxaccess-compat)
  — drain function routes Write op_kind to on_write_complete_tx.

What was missing: a test that combines the public `write_value_with_
handle` API with a real callback_router invocation against the SAME
`pending_ops` Arc the write populated. The new test:

1. Builds a Session via `connect_test_session`.
2. Calls `session.write_value_with_handle("TestObj.TestInt", ...)` —
   gets a real `WriteHandle { correlation_id }` and a real entry in
   `pending_ops` (no manual insertion).
3. Spins a parallel `callback_router` over the SESSION's
   `pending_ops` Arc + a fake event_tx (the live exporter's
   internal channel isn't reachable from tests; this is the
   established workaround pattern from
   `router_task_decodes_callback_invoked_into_broadcast`).
4. Injects the proven `WRITE_COMPLETE_OK` 5-byte frame.
5. Asserts the emitted `OperationStatus.context.correlation_id`
   equals the cid the write returned, that op_kind is Write, that
   reference is the original tag string, and that
   `pending_ops` is now empty (one-shot popped).

This closes the integration-test gap the user flagged. Live AVEVA
verification still falls under F49.

Workspace 823 → 824 tests; clippy + rustdoc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 07:57:15 -04:00
Joseph Doherty 4ff511bbed [F54] per-operation correlation + compat OnWriteComplete fan-out
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Closes the residual that R3/R4 Path A's commit `c73a33e` deferred:
the OperationStatus.context field was always None because no
in-flight correlation map existed in SessionInner, and the
mxaccess-compat broadcast channels for OnWriteComplete /
OperationComplete were exposed on the public API but had no
fan-out task draining session events into them.

**mxaccess (Part 1 — per-operation correlation):**

- New `pending_ops: Mutex<HashMap<[u8; 16], OperationContext>>` on
  SessionInner. Populated when `Session::write*` / `subscribe*`
  dispatches an outstanding operation; entry removed when the
  matching OperationStatus event fires (one-shot semantics).
- New `Session::write_with_handle` (and equivalents for the secured /
  timestamped paths) returns a `WriteHandle { correlation_id }` so
  consumers can correlate completions back to their originating
  call. Existing `write` / `write_value` / etc. signatures unchanged
  and delegate to the handle-returning variant.
- Callback router extended to look up `pending_ops` by correlation_id
  on each operation-status event. When found, populates
  `OperationStatus.context: Some(OperationContext { correlation_id,
  op_kind, reference, retry_count: 0 })`. When not found, falls
  through with `context: None` (verbatim-preserve per CLAUDE.md).
- New unit tests assert: matching correlation_id populates context,
  unknown correlation_id leaves context None, the entry is removed
  from `pending_ops` after one event fires.

**mxaccess-compat (Part 2 — compat-layer fan-out):**

- New `correlation_to_item: tokio::sync::Mutex<HashMap<[u8; 16], i32>>`
  on LmxClientInner.
- `LmxClient::write` / `write_2` / `write_secured` / `write_secured_2`
  call `Session::write_with_handle` (or equivalent) and insert
  `correlation_id → item_handle` into the map before returning.
- `LmxClient::register` / `register_asb` spawn a background task that
  drains `session.operation_status_stream()`. Per event, looks up
  `correlation_to_item[event.context?.correlation_id]` to find the
  item_handle, then routes:
  - `OperationKind::Write` / `OperationKind::WriteSecured` →
    `WriteCompleteEvent { server_handle, item_handle, statuses,
    is_during_recovery }` into `on_write_complete_tx`.
  - Other variants → `OperationCompleteEvent { ... }` into
    `on_operation_complete_tx`.
  - Removes the correlation_id from `correlation_to_item` after
    firing (one-shot).
- Events with no matching item_handle (correlation_id not in map)
  are dropped silently — no bogus item_handle=0 events.
- Task cancelled on LmxClient drop via `JoinHandle::abort` (matches
  the existing `subscription_task` pattern).
- New unit tests cover: Write op routes to on_write_complete, Read
  op routes to on_operation_complete, unknown correlation_id is
  dropped.

Result: the C# `LMX_OnWriteComplete(int hLMXServerHandle, int
phItemHandle, ref MXSTATUS_PROXY[] pVars)` callback shape is now
end-to-end-achievable. A consumer calls `LmxClient::write(hServer,
hItem, value, userId)` and drains `client.on_write_complete()`; the
yielded `WriteCompleteEvent` carries the right `(server_handle,
item_handle, statuses, is_during_recovery)` tuple.

Public API: `Session::write_with_handle` + `WriteHandle` are new;
existing signatures unchanged. `cargo public-api` baselines
regenerated under `design/public-api/{mxaccess,mxaccess-compat}.txt`.

Workspace: 765 → 823 tests pass (~58 new tests from F54). Clippy
`-D warnings` clean. Rustdoc `-D warnings` clean.

F54 status in `design/followups.md` moved Open → Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 07:41:28 -04:00
Joseph Doherty f98ab9846d design/70-risks: record the .NET reference's WriteCompleted half-implementation
R3's verdict gains an aside documenting why the original native
MxAccess `OnWriteComplete` event has historically only fired for the
one exact 5-byte pattern `00 00 50 80 00` (= `MxStatus.WriteCompleteOk`).

Verified at:
- `src/MxNativeClient/MxNativeCompatibilityServer.cs:756` —
  `if (!evt.Message.IsMxAccessWriteComplete) return;` gates the
  consumer-facing `WriteCompleted` event.
- `src/MxNativeCodec/NmxOperationStatusMessage.cs:18` —
  `IsMxAccessWriteComplete` requires
  `Format == StatusWord && StatusCode == 0x8050 && CompletionCode == 0x00`.

Every other completion frame is silently dropped — the 1-byte
`0x00`/`0x41`/`0xEF` ones, plus any non-success status word.

This was the underlying reason R3/R4 looked unsolvable for a year:
the answer "we don't know how to map" was actually "the native
compatibility shim deliberately doesn't map these because firing
typed failure events on ambiguous bytes was never a goal."

Path A's `MxStatus::from_packed_u32` (commit `c73a33e`) closes the
asymmetry on the Rust side: `Session::operation_status_events()`
exposes ALL typed outcomes the upstream synthesizer produces, not
just the WriteCompleteOk slice. The Rust port now has strictly
broader operation-status visibility than the .NET reference offered.

Recorded so future contributors don't re-derive this from scratch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 07:13:28 -04:00
Joseph Doherty c73a33edd8 [R3/R4 Path A] mxaccess: port Lmx.dll FUN_10100ce0 synthesizer kernel
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Path A landed for R3/R4. The byte->MxStatus synthesizer in Lmx.dll is
FUN_10100ce0 (`analysis/ghidra/exports/Lmx.dll.synthesizer-helpers2-decompile.md`),
a 4-byte u32 LE -> 4-tuple MxStatus decoder used by every NMX-frame
parser in Lmx.dll. The kernel is byte-deterministic and context-free,
so it ports as a pure function -- the operation-tracking state
machine the original verdict deferred is NOT required for synthesis.

Bit layout (per FUN_10100ce0 lines 21-24):
  bit 31:        success    (-1 if set, 0 if clear)
  bits 27..24:   category   (4 bits)
  bits 23..20:   detected_by (4 bits)
  bits 15..0:    detail     (i16 -- low 16 bits, signed)
  bits 30..28, 19..16: reserved/padding

Codec changes:
- MxStatus::from_packed_u32() / ::to_packed_u32() -- the kernel +
  inverse for round-trip parity.
- MxStatus::from_nmx_response_code() -- the constructed-from-response-
  code switch in FUN_1010bd10:741-770 (six proven mappings: 0x01, 0x02
  -> CommunicationError + RequestingNmx; 0x03 -> ConfigurationError +
  RequestingNmx; 0x04 -> ConfigurationError + RespondingNmx; 0x05 ->
  CommunicationError + RespondingNmx; 0x1A -> CommunicationError +
  RequestingNmx).
- MxStatusCategory / MxStatusSource: from_i16/to_i16 promoted to const
  fn so MxStatus::from_packed_u32 can be const.
- NmxOperationStatusMessage::try_parse_process_data_received_body() --
  thin wrapper that peels the outer NmxObservedEnvelope before
  delegating to try_parse_inner. Mirrors
  NmxOperationStatusMessage.TryParseProcessDataReceivedBody (.NET cs:20-32).
- NmxOperationStatusMessage::promote_to_typed() -- entry point that
  returns the existing Status field. Documented as a no-op pass-through
  for now (the 5-byte inner-body wire shape is NOT the same field as
  the 4-byte packed-u32 the kernel decodes); kept for API symmetry.
- 22 new round-trip tests covering the kernel, the response-code
  switch, the proven 0x00/0x41/0xEF completion bytes, and round-trip
  for every canonical sentinel.

mxaccess (Session) changes:
- New OperationKind enum (Write/WriteSecured/Read/Subscribe/
  Unsubscribe/Activate/Suspend/Other).
- New OperationContext struct (correlation_id, op_kind, reference,
  retry_count) -- ground for the F54 follow-on per-operation
  correlation work.
- New OperationStatus event type {raw, status, context,
  is_during_recovery}, mirroring MxNativeOperationStatusEvent (cs:73-78)
  with the typed-MxStatus addition.
- Session::operation_status_events() -> broadcast::Receiver<Arc<
  OperationStatus>> + operation_status_stream() Stream variant.
- callback_router() now tries operation-status parsing first, falling
  through to subscription messages -- matches MxNativeSession
  .OnCallbackReceived dispatch order (cs:574,582,590).
- recover_connection() flips a recovery_active counter (Arc<AtomicU32>
  shared with the router) so OperationStatus.is_during_recovery is
  populated correctly. Mirrors MxNativeSession._recoveryActive
  Volatile.Read at cs:573.
- 3 new router tests covering: status-word frame dispatch + typed
  promotion to WriteCompleteOk; completion-only frames stay verbatim;
  is_during_recovery is stamped from the live counter.

Per-operation context tracking (correlating completion frames back to
outstanding writes/subscribes via the correlation_id) is filed as F54
in design/followups.md. The synthesizer kernel itself is byte-
deterministic, so the kernel and the correlation work are decoupled.

Ghidra evidence (the next-ring xref walk beyond FUN_10114a90):
- analysis/ghidra/exports/Lmx.dll.set-attribute-result-xrefs.md --
  xrefs to OnSetAttributeResult / CancelWithStatus / OperationComplete.
- analysis/ghidra/exports/Lmx.dll.vtable-data-xrefs.md -- vtable-slot
  data xrefs for the virtual-dispatch path.
- analysis/ghidra/exports/Lmx.dll.synthesizer-decompile.md --
  ScanOnDemandCallback::OperationComplete/MultipleOperationComplete
  (FUN_1010b990), RemotePlatformResolver::OperationComplete
  (FUN_1010dc80), and the constructed-from-responseCode synthesizer
  in FUN_1010bd10 (lines 698-770). FUN_1010bd10 is the wire-frame
  receiver that drives the synthesis.
- analysis/ghidra/exports/Lmx.dll.synthesizer-helpers-decompile.md --
  FUN_10003fc0 (the <success %d category %d ...> formatter; confirms
  the 4-tuple layout), FUN_1008f150 (dispatch helper).
- analysis/ghidra/exports/Lmx.dll.synthesizer-helpers2-decompile.md --
  FUN_10100ce0 (the kernel itself), FUN_10100bc0 (3xu16 reader),
  FUN_1005e580 (4-byte stream reader), FUN_1010ee00 (sister NMX-frame
  parser using the same kernel).
- analysis/ghidra/exports/Lmx.dll.synthesizer-callers-xrefs.md --
  caller graph; confirms the kernel is called from many wire-frame
  parsers but each parser shares the single 4-byte decoder.

R3/R4 verdict updated in design/70-risks-and-open-questions.md from
"settled at verbatim-preserve" to "settled per Path A". F54 filed in
design/followups.md for the per-operation correlation work.

cargo build / test / clippy -D warnings / RUSTDOCFLAGS=-D warnings doc
all clean. cargo public-api baselines regenerated for mxaccess and
mxaccess-codec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 07:08:36 -04:00
Joseph Doherty 460c61df43 [R3/R4] Path-A trace: synthesizer is in Lmx.dll's NMX-frame decoder
Five-stage Ghidra headless decompile traces the byte-to-MXSTATUS_PROXY
synthesis path end-to-end across LmxProxy.dll and Lmx.dll. New evidence
files committed alongside R3/R4 verdict update:

- analysis/ghidra/exports/LmxProxy.dll.fire-event-xrefs.md
- analysis/ghidra/exports/LmxProxy.dll.status-synthesis-decompile.md
- analysis/ghidra/exports/LmxProxy.dll.mxstatus-safearray-decompile.md
- analysis/ghidra/exports/Lmx.dll.set-attribute-result-decompile.md

Layer-by-layer findings (bytes flow inward; synthesis flows outward):

1. `Lmx.aaDCT` at 0x10178fc0 is `SysAllocString(L"Lmx.aaDCT")` — a
   tracing category BSTR, not a table.
2. `MXSTATUS_PROXY` is a 16-byte marshalled struct (4 × i16 padded
   to i32 boundaries with Pack=4) — the OUTPUT of synthesis, not a
   lookup entry.
3. `LmxProxy.dll` Fire_* event handlers receive already-populated
   `MXSTATUS_PROXY[]` and forward through ATL dispatch — no synthesis.
4. `LmxProxy.dll` Fire_* CALLERS (FUN_1001657f / FUN_10016b50 /
   FUN_10016d4b) call FUN_10003f60(out_safearray, in_status_ptr,
   count=1) which is a VERBATIM memcpy of an existing 14-byte buffer
   into the SAFEARRAY — no transformation.
5. `Lmx.dll`'s `PreboundReference::OnSetAttributeResult` (FUN_10114a90)
   receives an already-populated `short *param_7` status buffer. Log
   line confirms the layout: `<success %d category %d detectedBy %d
   detail %d>`. Dispatches on typed values — synthesis is upstream of
   this function too.

The synthesizer is the NMX-frame decoder in Lmx.dll that calls
OnSetAttributeResult / OnGetAttributeResult / equivalent
OperationComplete handler. The decoder takes raw NMX bytes plus
operation context (item handle, engine state, retry state,
correlation id) and computes the populated MXSTATUS_PROXY. There is
NO static lookup table — synthesis is per-message contextual.

Two viable paths to typed promotion (both substantial; neither a
small codec patch):

- Path A: port the synthesizer. ~1-2 weeks. Requires extending the
  Rust session to track per-operation context (handles, retries,
  correlation ids). Out of V1 scope.
- Path B: empirical capture pairs. ~30 min × 6-10 scenarios. Output
  is a (byte, context → status) mapping that approximates without
  re-implementing. Risk: mapping is only valid for captured contexts.

R3/R4 stay settled at verbatim-preserve. The .NET reference does
the same for the same reason: the synthesizer is too context-
dependent to mirror without porting the entire operation-tracking
state machine.

Reopen criteria sharpened: either (a) a consumer files a concrete
use case for typed promotion of a specific byte+context combination
(Path B's empirical capture for that one combination is the cheapest
answer); or (b) a major-version bump justifies the state-machine
port (Path A).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:33:02 -04:00
Joseph Doherty 4dfc0cee65 [R3 + R4 + R8] settle protocol-level risks per Ghidra evidence
Ghidra headless decompile of `Lmx.dll`'s `aaDCT` symbol + the
`LmxProxy.dll` Fire_* event handlers (logs at
`analysis/ghidra/exports/Lmx.dll.aadct-decompile.md` and
`analysis/ghidra/exports/LmxProxy.dll.completion-status-decompile.md`)
settles **R3** and **R4** as "no static byte→status lookup table
exists":

- `Lmx.aaDCT` at `0x10178fc0` is a `SysAllocString(L"Lmx.aaDCT")` into
  a global BSTR — a logging category name, not a table.
- `MXSTATUS_PROXY` is a 4-field struct (success/category/detectedBy/
  detail), used as the marshalled COM event payload — not a static
  array of pre-mapped statuses.
- `Fire_OnDataChange` / `Fire_OnWriteComplete` / `Fire_OperationComplete` /
  `Fire_OnBufferedDataChange` (RVAs 0x15f72, 0x1611f, 0x16271, 0x163c0
  in `LmxProxy.dll`) receive ALREADY-POPULATED `MXSTATUS_PROXY[]`
  arrays — the byte-to-struct synthesis happens upstream in the
  proxy's NMX-callback ingestion code, not via a table lookup. The
  synthesis is per-event computation from operation context (engine
  ids, item handles, retry counters), not a static promotion.

R3/R4 status updated from "indefinitely deferred — no Ghidra table"
to "settled — no table exists; verbatim preservation is the canonical
answer." The .NET reference's `NmxOperationStatusMessage.TryParseInner`
+ the Rust port's `mxaccess-codec/src/operation_status.rs` already
match this canonical behaviour; no code change required.

Reopen R3/R4 only if a context-aware capture surfaces a per-byte
synthesis logic that depends on operation context — at which point
the codec would need access to the originating operation's context,
which is upstream of the bytes themselves.

**R8** marked permanently deferred — implementation already parses
NTLM AV pairs per [MS-NLMP] §2.2.2.1 (including the cross-domain
shapes `MsvAvDnsTreeName` / `MsvAvDnsComputerName` carrying the
trusted-domain DNS suffix), what's missing is the live capture, and
the live capture requires a multi-domain Windows lab not available
on this dev host. Same status pattern as F3 in `design/followups.md`.

Open evidence gaps table updated to reflect:
- Cross-domain NTLM: deferred (R8)
- Ghidra mapping table for completion-only bytes: no table exists
  (R3/R4 settled)
- Activate/Suspend transition (wire): partial (F44 + F46), live re-run
  pending (F50)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:23:05 -04:00
Joseph Doherty 0e93e3a8fa design/followups: file F48-F53 for known V1 residuals
After the M6 closeout sweep (F35-F47 all resolved), six residuals
remain that were either documented inline in design docs or implied
by closure language but not formally tracked as F-numbers. File them
explicitly so the next iteration has a clean starting list:

- F48: Execute `cargo publish` for V1 (F43 was dry-run only). Documents
  the 9-crate dependency-ordered publish sequence + the version bump
  from 0.0.0 placeholder to 0.1.0.
- F49: Live verification sweep for F36 (buffered subscribe) +
  F45 (recovery replay) + F47 (unsubscribe skip) + F40 (metrics).
  Closes the gap where these features ship with unit tests but
  weren't live-exercised against AVEVA in the closing iteration.
- F50: Run the F46 Suspend/Activate Frida capture live (script
  ready, capture deferred to maintainer-side).
- F51: Live type-matrix expansion for `asb-subscribe` — Bool /
  Float / Double / String / DateTime / Duration / arrays. F32 was
  closed via "deployable maximum" but the codec supports more types
  than the live matrix exercises.
- F52: Codec performance optimisations from F39 (BytesMut output
  buffer, name-signature cache, session scratch pool). Documented as
  post-V1 in M6-bench-baseline.md; filing them as F-numbers so the
  alloc-count deltas are tracked when they land.
- F53: Enable `#![warn(missing_docs)]` workspace-wide. Deferred from
  F42 — the lint surfaces hundreds of low-priority gaps that need a
  dedicated pass.

R3, R4, R8, F3 already in their respective tracking docs (the risks
register + the Open section's permanently-external-blocked entry).

Open section now contains: F3 (permanently external-blocked) +
F48-F53 (V1-residual triage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:11:11 -04:00
Joseph Doherty 25befcb72e design/followups: move F45 + F47 to Resolved (M6 + spawned closures)
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
F45 (commit 9b57cf8) and F47 (commit 1a1830f) close the buffered-
subscription recovery + unsubscribe symmetry gap that F36 left open.
The Open section now contains only F3 (cross-domain NTLM Type1/2/3
fixture, permanently external-blocked on this single-domain dev
host — needs multi-domain Windows lab).

This is the end-state for V1: all M0-M6 followups resolved plus the
two M6-spawned follow-ons. F3 stays Open as a documented external
gap; reopen it if the dev host gains a second domain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:59:38 -04:00
Joseph Doherty 1a1830f3bf [F47] mxaccess: unsubscribe skips UnAdvise for buffered subscriptions
Mirrors the .NET reference's `if (!subscription.IsBuffered)` guard
at `MxNativeSession.cs:361-381`. The Rust port previously emitted an
`UnAdvise` frame for both plain and buffered subscriptions; the
buffered server-side registration is unwound by the engine when the
`RegisterReference` handle goes away, so emitting an `UnAdvise` for
buffered entries is at best a no-op extra frame and at worst could
race with the engine's own teardown.

Fix: branch `Session::unsubscribe` on `SubscriptionEntry::mode` (the
discriminator F45 added). For `SubscriptionMode::Buffered { ... }`,
skip the `un_advise` call and proceed directly to registry cleanup.
For `SubscriptionMode::Plain`, retain the previous behaviour.

The registry-entry probe runs first (separate lock acquisition) so
the `is_buffered` decision doesn't hold the NMX-client mutex
unnecessarily — common case where the entry is plain still acquires
the NMX lock immediately after.

The metrics counter `record_unadvise()` still fires on every public
`unsubscribe` call regardless of mode — it tracks consumer-side
unsubscribe rate, not wire-frame rate. That matches what dashboards
expect from the public API.

New unit test `unsubscribe_skips_un_advise_for_buffered_subscription`
issues a plain subscribe (recorded as 1 RPC), mutates the registry
entry to `SubscriptionMode::Buffered`, calls unsubscribe, and
asserts the recorded RPC count stays at 1 (no UnAdvise emitted).
The existing `subscribe_populates_registry_unsubscribe_clears_it`
test serves as the negative control for the plain branch.

Workspace 794 → 795 tests; clippy clean; rustdoc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:58:57 -04:00
Joseph Doherty 9b57cf8f3b [F45] mxaccess: recovery replay re-issues RegisterReference for buffered subs
`Session::recover_connection_core` previously walked
`SessionInner::subscriptions` and replayed every entry via
`AdviseSupervisory`, which lost the `.property(buffer)` registration
on buffered subscriptions — silently downgrading buffered → plain on
transport rebuild.

Fix:

- New `pub(crate) enum SubscriptionMode { Plain, Buffered { ... } }`
  discriminator carried on each `SubscriptionEntry`. Buffered variant
  retains the un-suffixed reference + the rounded interval (so the
  re-issued buffered registration matches the original cadence) +
  the empty `item_context` / zero `item_handle` matching the wire
  send.
- `Session::subscribe` (plain path) records `SubscriptionMode::Plain`.
  `subscribe_buffered_nmx` records `SubscriptionMode::Buffered { ... }`.
- `recover_connection_core` matches on `entry.mode`. Plain branch
  unchanged. Buffered branch re-applies `.property(buffer)` via
  `to_buffered_item_definition` (idempotent), rebuilds the original
  `NmxReferenceRegistrationMessage` with the saved correlation id +
  `subscribe = true`, and dispatches `register_reference` (kind=
  ItemControl, inner command 0x10) against the replacement
  transport. Mirrors `MxNativeSession.ReAdviseSubscription`
  (`MxNativeSession.cs:538-569`).

New unit test `recover_connection_replays_buffered_subscription_via_
register_reference` synthesises a buffered registry entry, installs a
`RebuildFactory` pointing at a recording NMX server, drives
`recover_connection`, then asserts the recorded `TransferData` carries
inner command `0x10` (NOT `0x1f`) with the `.property(buffer)`-
suffixed item_definition + the saved correlation id + subscribe=true.

Side-finding worth filing separately: `Session::unsubscribe`
unconditionally calls `un_advise` for both plain and buffered
entries, but the .NET reference's `Unsubscribe`
(`MxNativeSession.cs:361-381`) skips `UnAdvise` for buffered
(`if (!subscription.IsBuffered)`). Out of scope for F45 (recovery-
only); will file as F47.

Public API unchanged. `SubscriptionMode` + `SubscriptionEntry` stay
`pub(crate)` — `cargo public-api -p mxaccess` baseline is unchanged.

Workspace 793 → 794 tests; clippy clean; rustdoc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:54:30 -04:00
Joseph Doherty 2281309a86 design/followups: move F46 to Resolved (Frida hooks landed) 2026-05-06 05:43:43 -04:00
Joseph Doherty 808fea18a0 [F46] analysis/frida: Suspend/Activate hooks + R5 next-step
Closes the wire-side gap left by capture 077 in F44's R5 walk. The Frida
script now hooks the production LmxProxy.dll dispatchers so a future live
re-run on the AVEVA host can answer "does CLMXProxyServer issue a separate
ORPC method for Suspend/Activate, or are they synthesised client-side?"

Hooks added in `analysis/frida/mx-nmx-trace.js`:
- `LmxProxy.dll!CLMXProxyServer.Suspend`  @ RVA 0x13d9c (FUN_10013d9c)
- `LmxProxy.dll!CLMXProxyServer.Activate` @ RVA 0x14028 (FUN_10014028)

Both RVAs were extracted from
`analysis/ghidra/exports/LmxProxy.dll.string-refs.tsv` rows 119/122 (the
`CLMXProxyServer::Suspend - Server Handle` / `Activate - Server Handle`
log strings each xref one function — same pattern as the existing
AdviseSupervisory hook at 0x142b4). The hooks emit `mx.suspend.begin/end`
and `mx.activate.begin/end` events with serverHandle, itemHandle, and the
`MxStatus*` out parameter decoded as 4 x int16 (Success / Category /
DetectedBy / Detail per `src/MxNativeCodec/MxStatus.cs`). Naming matches
the F46 spec's `mx.<verb>.begin / end` grep convention rather than the
generic `call.enter / leave` shape because we want to filter these out
of large traces without false positives from other LmxProxy entrypoints.

No `Resume` / `Reactivate` exports exist in `LmxProxy.dll` — verified
against `analysis/ghidra/exports/LmxProxy.dll.ghidra.md` (no such string
xrefs) and the decompiled `ILMXProxyServer5` / `ILMXProxyServer4`
interfaces under `analysis/decompiled-mxaccess/ArchestrA/MxAccess/`
(only Suspend and Activate are declared on the dispatch interface).

The script's top-of-file comment now carries the live re-run procedure
(rebuild MxTraceHarness x86, attach Frida with `--scenario=suspend-advised`
then `--scenario=activate-advised`, save under
`captures/NNN-frida-suspend-activate-instrumented/`, grep the new TSV for
`mx.suspend.*` / `mx.activate.*` and correlate with `nmx.enter` events
in the same time window). Live capture is intentionally deferred to the
maintainer per the F46 spec — this dev box has no AVEVA install.

`design/70-risks-and-open-questions.md` R5 status updated:
- Title flag `(filed as F45)` -> `(filed as F46, hook landed pending live re-run)`
  (the docs/M6-buffered-evidence.md footnote referenced F45 from before
  F45 / F46 were de-conflicted by commit 2120dfa).
- New "Next step - F46" paragraph documents the two hooked RVAs, the
  out-param decode shape, and the verified absence of Resume / Reactivate
  symbols.
- "Current best answer" paragraph re-points the residual ORPC question
  at F46.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:42:57 -04:00
Joseph Doherty c7e71e4424 design/followups: move F41 + F43 to Resolved (M6 complete)
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
All 10 M6 sub-followups (F35-F44 minus the ones absorbed into F44)
plus F41 + F43 are now resolved. Open section narrows to:
- F45: buffered recovery replay (sub-followup of F36)
- F46: Suspend/Activate wire emission (sub-followup of F44)
- F3: cross-domain NTLM fixture (permanently external-blocked)

M6 closeout: see CHANGELOG.md for the V1 release notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:34:45 -04:00
Joseph Doherty 7b15c853d1 [F43] release prep: CHANGELOG + cargo publish --dry-run validation
V1 release prep per M6 DoD bullet 6:

**`CHANGELOG.md`** — V1 release notes covering all 9 workspace crates
(`mxaccess-codec`, `mxaccess-rpc`, `mxaccess-asb-nettcp`,
`mxaccess-asb`, `mxaccess-galaxy`, `mxaccess-callback`,
`mxaccess-nmx`, `mxaccess`, `mxaccess-compat`), the M0 → M6
milestone closeouts, deliberate divergences from the .NET reference
(multi-record DataUpdate codec relaxation per F44; buffered single-
sample stream per R2), and known limitations (F3 cross-domain NTLM,
F45 buffered recovery replay, F46 Suspend/Activate wire instrumentation,
R3/R4 OperationComplete trigger). Documents the dependency-ordered
publish sequence (leaf crates first; dependent crates require their
deps to exist on crates.io before their dry-runs can run).

**`cargo publish --dry-run` validation:**
- Leaf crates (mxaccess-codec, mxaccess-rpc, mxaccess-asb-nettcp):
  dry-run passes — tarball builds, metadata complete, license/
  description/repository/rust-version all present via
  `workspace.package`.
- Dependent crates (mxaccess-asb, mxaccess-galaxy, mxaccess-callback,
  mxaccess-nmx, mxaccess, mxaccess-compat): dry-run fails with
  "no matching package" against crates.io — expected behaviour, the
  registry lookup happens even with `--no-verify`. Validation of
  these crates falls to the build-test-clippy-public_api matrix
  rather than dry-run.

`design/followups.md`: F43 moved to Resolved with a verdict pointing
at this commit + the CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:33:43 -04:00
Joseph Doherty f0c9dd2214 rust: add version specifiers to workspace path deps for cargo publish 2026-05-06 05:31:57 -04:00
Joseph Doherty 9e57bfd451 [F41 + F44 reconciliation] cargo public-api baselines + multi-record DataUpdate codec
**F41 — public-api baselines (M6 DoD bullet 5)**

`design/public-api/{crate}.txt` for all 9 workspace crates, generated
via `cargo +nightly public-api --simplified -p <crate>`. Per-crate
baseline sizes:
- mxaccess-codec: 2516 lines
- mxaccess-asb:   1258 lines
- mxaccess-rpc:   1273 lines
- mxaccess-asb-nettcp: 708 lines
- mxaccess: 542 lines
- mxaccess-galaxy: 374 lines
- mxaccess-callback: 170 lines
- mxaccess-compat: 123 lines
- mxaccess-nmx: 118 lines

`design/public-api/README.md` documents the update procedure
(install nightly + cargo-public-api, regenerate the affected baseline
on intentional API changes, commit alongside).

`.github/workflows/rust.yml` gains a `public-api` job that runs the
same diff against the committed baseline; drift fails CI with a
unified diff in the log so the PR author can either revert or
update the baseline.

**F44 reconciliation — multi-record DataUpdate codec**

Cherry-picked from the F44 sub-agent's worktree (commit `aec6a0c`):
`subscription_message.rs::parse_data_update` now loops over
`record_count` like `parse_subscription_status` does, accepting any
positive count. The .NET reference still hard-throws on
`record_count != 1`; the Rust codec deliberately diverges per the F44
evidence walk against `captures/094-frida-buffered-separate-writer/
frida-events.tsv:145` (a `0x33` DataUpdate body with `record_count = 2`,
inner_length = 23 (preamble) + 2 * 19 (records) = 61, post a
separate-session writer triggering two value changes inside one
`SetBufferedUpdateInterval(1000)` window).

Two new round-trip tests:
- `data_update_multi_record_round_trip` — synthesises a 2-record body,
  parses, asserts both records decode to expected Int32 values.
- `data_update_capture_094_truncated_record_errors` — truncates the
  capture-094 fixture mid-second-record, asserts CodecError::Decode.

New wire-byte fixtures under `crates/mxaccess-codec/tests/fixtures/m6-buffered/`:
- `094-line145-dataupdate-recordcount2.bin` (57 bytes, `0x33` multi-record)
- `094-line48-substatus-recordcount2.bin` (101 bytes, `0x32` multi-record)

R2 in `design/70-risks-and-open-questions.md` updated from
"single-sample (settled silently)" to "settled per option (a) — codec
relaxed; multi-record observed in production-stack tracing."

`design/followups.md`: F44's verdict updated to reflect the
contradiction-then-relaxation, with reference to the new tests +
fixtures.

Workspace 792 → 794 tests pass; clippy clean; rustdoc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:27:11 -04:00
Joseph Doherty 2120dfa965 design/followups: move F35/F40/F44 to Resolved + de-conflict F45/F46
rust / build / test / clippy / fmt (push) Has been cancelled
After commits d5aa152 (F35) and ad1cf23 (F36+F40+F44), three M6
sub-followups belong under Resolved with concise verdicts referencing
the matching commits.

Sub-agent merge cleanup:
- Two sub-agents independently filed new followup F45 in parallel —
  rename the Suspend/Activate wire-emission gap to F46, leaving the
  buffered-recovery-replay item as F45 (filed by the F36 work since
  it's the more immediate dependent).
- Open section now contains only F41 + F43 + F45 + F46 + F3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:15:13 -04:00
Joseph Doherty ad1cf2351c [F36 + F40 + F44] M6 wave 1: subscribe_buffered (NMX) + metrics + evidence
Three M6 sub-followups landed in this wave (sub-agent worktrees +
manual reconciliation in main):

**F36 — Session::subscribe_buffered (NMX) per R2 single-sample**
- `BufferedOptions::rounded_update_interval_ms()` — 100ms rounding
  helper mirroring MxNativeCompatibilityServer.cs:638
  ((updateInterval + 99) / 100) * 100, saturating on overflow.
- `Session::subscribe_buffered` (public, lib.rs:604) delegates to
  the new private `subscribe_buffered_nmx` which uses the buffered
  RegisterReference path: item_definition suffixed with
  `.property(buffer)`, subscribe=true (no separate
  AdviseSupervisory follow-up — verified against capture 082).
- Per R2 verified at wwtools/mxaccesscli/docs/api-notes.md the wire
  semantic is single-sample-per-event with a server-side cadence
  knob; rounded_ms is held client-side only (native MXAccess does
  not emit a separate SetBufferedUpdateInterval RPC, verified by
  absence in 079/082 captures).
- New crates/mxaccess/examples/subscribe-buffered.rs.
- New crates/mxaccess-codec/tests/buffered_register_reference_parity.rs:
  4 tests (capture 079/082 round-trip, suffix helper, constructive
  forward-build vs capture 082).

**F40 — Optional metrics feature**
- New crates/mxaccess/src/metrics.rs (275 lines): `pub(crate)`
  thin wrappers (`record_write_latency`, `record_read_latency`,
  `inc_writes`, `inc_reads`, `inc_advises`, `inc_recovery_*`,
  `set_active_subscriptions`, etc.) that compile to no-ops under
  `#[cfg(not(feature = "metrics"))]`. Call sites in session.rs +
  asb_session.rs invoke them unconditionally; the gate is inside
  the wrapper.
- `metrics = { version = "0.24", optional = true }` added to
  workspace + mxaccess crate Cargo.toml.
- Default build: zero metrics dep, zero runtime cost.

**F44 — Buffered batch + suspend capture decode evidence**
- New docs/M6-buffered-evidence.md: per-capture summary for
  077, 079, 080, 081, 082, 094 — call sequence, key wire bytes,
  R2/R5 verdict.
- R2 confirmed silently as "not a real risk" — single-sample
  observed across 079/080/082/094.
- R5 trigger conditions documented from capture 077: AdviseSupervisory
  + Suspend pair, 1-second intervals, succeeds on enum attributes.
- design/70-risks-and-open-questions.md R2/R5 status updated.

Workspace: 759 → 792 tests, clippy clean, rustdoc -D warnings clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:12:17 -04:00
Joseph Doherty d5aa152b1f [F35] mxaccess-compat: LMXProxyServer-shaped facade (18 methods)
Replace the 8-line `mxaccess-compat` stub with a real `LmxClient`
struct exposing the 18 `ILMXProxyServer5` methods as Rust async fns
on top of `mxaccess::Session` (NMX) and `mxaccess::AsbSession` (ASB).

Handle-table approach
* `Mutex<HashMap<i32, ItemRef>>` for item handles, populated by
  `add_item` / `add_item_2` / `add_buffered_item`, drained by
  `remove_item` / `unregister`.
* `Mutex<HashMap<i32, UserRef>>` for user handles allocated by
  `authenticate_user` / `archestra_user_to_id`.
* `AtomicI32` monotonic counters for both, matching the .NET
  reference's `_nextItemHandle` / `_nextUserHandles` per
  `MxNativeCompatibilityServer.cs:62-63`.

Stream-based event surface (per Q4)
* `OnDataChange` / `OnBufferedDataChange` / `OnWriteComplete` /
  `OperationComplete` exposed as `EventStream<T>: Stream<Item=T>`,
  backed by `tokio::sync::broadcast` channels. Lag silently skips
  past `BroadcastStream::Lagged` to keep the public `Item` shape
  ergonomic. NOT COM events — that's the post-V1
  `mxaccess-compat-com` crate per design/70-risks-and-open-questions.md
  Q4. The `OperationComplete` channel is wired but no firing path
  is modelled (R3 deferred — no captured byte mapping yet).
* `Advise` / `AdviseSupervisory` spawn a background fan-out task
  that drains the `Subscription` stream and routes each
  `DataChange` to either `on_data_change` or
  `on_buffered_data_change` based on the item's `is_buffered` flag.
  `UnAdvise` / `RemoveItem` abort the task.

Pass-through methods
* `Write` / `Write2` -> `Session::write` / `write_with_timestamp`
  (`userId` ignored — the underlying surface uses engine identity).
* `WriteSecured2` -> `Session::write_secured_at` with both user ids
  always passed (R6: single-user secured = same id twice; never
  gated).
* `AdviseSupervisory` collapses onto `Session::subscribe` because
  the wire path is `AdviseSupervisory` already (`session.rs:1057`),
  matching the .NET reference's `cs:251-259` identical collapse.
* `SetBufferedUpdateInterval` rounds up to nearest 100 ms per
  `MxNativeCompatibilityServer.cs:638`.

Stubbed pass-throughs (mirror upstream `Error::Unsupported`)
* `WriteSecured` (no timestamp) — `Session::write_secured` is
  stubbed at `crates/mxaccess/src/lib.rs:472` (only
  `WriteSecured2`/`0x3A` is ported); workaround documented inline.
* `AddBufferedItem` allocates the handle but `Advise` for buffered
  items does not yet drive `Session::subscribe_buffered` cadence
  knob — TODO(F36) flagged inline at `add_buffered_item` and
  `set_buffered_update_interval`.

Tests (25 new, all green)
* Handle-table lifecycle: Add -> Advise -> UnAdvise -> Remove with
  a mocked subscription task.
* Monotonic handle allocation; context-prefix combination.
* `SetBufferedUpdateInterval` rounding (50 -> 100, 101 -> 200, etc.)
  + zero-rejection.
* Compile-time check that all 18 LMX methods are reachable on
  `LmxClient`.
* Each event stream yields published items; lag silently dropped.
* GUID-shape validation; server-handle mismatch errors.

Build hygiene
* `cargo build -p mxaccess-compat` clean.
* `cargo test -p mxaccess-compat` -> 25 passed.
* `cargo clippy -p mxaccess-compat --all-targets -- -D warnings` clean.
* `RUSTDOCFLAGS=-D warnings cargo doc -p mxaccess-compat --no-deps` clean.

Deferred / TODOs
* TODO(F36): wire `set_buffered_update_interval` cadence into the
  `advise` path for buffered items.
* TODO(R3): plumb a real trigger into `on_operation_complete` once
  the byte mapping lands.
* TODO(wave 2): live integration tests against AVEVA.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:06:26 -04:00
Joseph Doherty a1c4c6203e design/followups: move F37/F38/F39/F42 to Resolved
rust / build / test / clippy / fmt (push) Has been cancelled
Four M6 sub-followups closed in this session — moved to Resolved
section with concise verdicts referencing the matching commits:

- F37 (commit 34045c2): ASB subscribe_buffered returns Unsupported
- F38 (commit 71c69b8): counting-allocator bench harness + R12
  baseline showing the target is already met
- F39 (closed-via-F38): zero-copy pass not needed for R12 target
  (1-4 allocs/op across the proven matrix); remaining
  optimisations documented as post-V1 work
- F42 (commit e79e289): cargo doc --workspace --no-deps clean

Open M6 work remaining: F35 (compat facade), F36 (NMX
subscribe_buffered), F40 (metrics feature), F41 (public-api
baseline), F43 (release prep), F44 (capture decode evidence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:47:38 -04:00
Joseph Doherty 71c69b80c6 [F38] mxaccess-codec: counting-allocator bench harness + R12 baseline
Hand-rolled GlobalAlloc wrapper around System that tracks allocs +
bytes + deallocs via two atomics. Each scenario runs 10k iterations
after a 1k warm-up; output is a markdown table with allocs/op,
bytes/op, deallocs/op.

Why hand-rolled (not dhat/criterion): R12 gates on a single number
("< 5 allocs/write"). dhat is heap-profiling-oriented (call-stack
attribution, JSON snapshots); criterion measures wall-clock latency
which is reported-but-not-gated per 60-roadmap.md:104. A 50-line
GlobalAlloc + atomic counters is the simplest thing that answers
the gate.

Run: `cargo bench -p mxaccess-codec`

Baseline numbers (release, Windows x64):
- Bool write:    1.00 allocs/op
- Int32 write:   2.00 allocs/op
- Float32 write: 2.00 allocs/op
- Float64 write: 2.00 allocs/op
- String write:  4.00 allocs/op (5-char string)
- Handle from_names: 2.00 allocs/op
- DataUpdate decode: 1.00 alloc/op

R12's < 5 allocs/write target is **already met** across the proven
matrix without any zero-copy work. The bench gates on this — any
write_message::encode scenario at >= 5 allocs/op exits the harness
with code 1.

Companion: `design/M6-bench-baseline.md` documents the numbers,
explains the per-scenario breakdown, and tightens F39's scope from
"hit the target" to "nice-to-have optimisations" (BytesMut output
buffer, name-signature cache, session-level scratch pool).

Workspace: 759 tests still pass; clippy --benches clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:45:33 -04:00
Joseph Doherty e79e289743 [F42] cargo doc --workspace --no-deps clean (0 warnings)
Fix all 33 rustdoc warnings across the workspace:

- Unresolved intra-doc links: rewrite [`name`] → either backtick text
  (when not actually a link) or fully-qualified `[Type::method]` /
  `[crate::module::name]` form. Affected: mxaccess-codec
  (asb_variant, item_control, metadata_query, observed_write_template,
  reference_handle, write_message), mxaccess-rpc (pdu), mxaccess-nmx
  (client), mxaccess-asb-nettcp (nmf), mxaccess-callback (exporter),
  mxaccess (asb_session, session, lib).
- Bracket-text being interpreted as link refs (e.g. `body[17]` →
  `` `body[17]` ``).
- Private-item references in public docs (CALLBACK_BROADCAST_CAPACITY,
  recover_connection_core, mxvalue_to_writevalue) reduced to
  backtick-text since they aren't part of the public API.

`RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps` now
exits clean. Workspace 759 tests pass; clippy clean.

Defers `#![warn(missing_docs)]` lint to a future pass — the cleanup
target is the broken-link warnings, which are signal; missing-docs
would surface hundreds of low-priority public-item gaps that are out
of scope for this F-number.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:39:51 -04:00
Joseph Doherty 34045c2f6d [F37] mxaccess: AsbSession::subscribe_buffered returns Unsupported
ASB has no `SetBufferedUpdateInterval` analogue — the per-monitored-
item `MinimalMonitoredItem::sample_interval` plays the cadence-knob
role. Calling `subscribe_buffered` on an ASB session now returns
`Error::Unsupported { transport: TransportKind::Asb, operation: ... }`
synchronously, without touching the wire.

The error-construction logic is split into a free fn
`unsupported_subscribe_buffered_error()` so the gate's exact shape
is unit-testable without spinning up a live authenticator + transport
fake. New unit test asserts both the variant tag and that the
operation message names the unsupported method + hints at the
`sample_interval` analogue.

Workspace 758 → 759 tests, clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:32:45 -04:00
Joseph Doherty 2546710604 design/followups: add F35-F44 for M6 implementation plan
10 new followups decompose M6 (compatibility shim + production
hardening) into parallel-safe sub-streams:

- F35: mxaccess-compat LMXProxyServer-shaped facade (18 methods over
  Session/AsbSession)
- F36: Session::subscribe_buffered NMX path per R2 single-sample
- F37: ASB subscribe_buffered capability gate
- F38: counting-allocator cargo bench harness for R12 target
- F39: zero-copy codec pass (depends on F38)
- F40: optional metrics feature
- F41: cargo public-api baseline (depends on F35/F36/F37/F39/F40)
- F42: cargo doc cleanup pass
- F43: cargo publish --dry-run all crates (depends on F41)
- F44: decode buffered batch + suspend captures (077, 079-082, 094)
  for R2/R5 evidence

Parallelization: Wave 1 = F35/F36/F37/F38/F40/F42/F44 (different
crates / different concerns); Wave 2 = F39 (needs F38's bench);
Wave 3 = F41 (needs API stable); Wave 4 = F43 (release).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:28:38 -04:00
Joseph Doherty bedad57b4e design/followups: move F18 (M5 meta-tracker) to Resolved
rust / build / test / clippy / fmt (push) Has been cancelled
Trim the planning content (sub-stream breakdown table, parallel-safety
analysis, risk-driven sequencing, "Resolves when" gate) since M5 is
done. Keep the closure verdict, M5 DoD checklist showing the actual
state at close, sub-followup closeout list (F19-F26 + F28/F29/F30/
F31/F32/F33/F34), cumulative execution log, and the architectural
note explaining why AsbSession stays parallel to the NMX Session
rather than unified — that's load-bearing for future maintenance.

Open section now contains only F3 (cross-domain NTLM Type1/2/3
fixture, permanently external-blocked on this single-domain dev host
— resolution requires multi-domain Windows lab not available here).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:09:03 -04:00