Commit Graph

143 Commits

Author SHA1 Message Date
Joseph Doherty 1f07da2e12 tools: upgrade Get-InfisicalSecret to stream separation, drop banner regex
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Earlier fix (commit 047125b) filtered the infisical CLI's
"A new release of infisical is available" upgrade banner from
captured output via regex matching. That worked but coupled the
filter to specific banner-pattern strings — a future banner shape
("Update available" / "New version detected" / a localized
message) would slip through and break NTLM Type1 auth again.

The principled fix is to stop capturing stderr at all.
PowerShell's call operator (`&`) keeps stdout and stderr on
separate streams unless explicitly merged; the previous code's
`2>&1` was the actual mistake. Without it, the banner stays in
the error stream (visible on the console for diagnostics) and
the captured `$value` contains only the script's stdout — which
for `Get-Secret.ps1` is just the secret value from `infisical
secrets get --plain`.

Verified: live re-run of F54 (lmx_write_complete_live) passes
post-change with `MX_TEST_DOMAIN='DESKTOP-6JL3KKO'` clean and
the banner visibly logged to console (stderr) above each [SET]
line. No regex coupling to a specific banner-pattern remains.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 09:30:52 -04:00
Joseph Doherty 047125bc11 M6 live verification: re-run all 5 steps + filter infisical banner
Three doc fixes pinned by re-running today's full live-test sweep:

1. Bump status header from 2026-05-06 to "re-run 2026-05-07" with a
   note that all 5 steps still pass against the live AVEVA install.
   The first run of step 1 + step 5 today failed with
   `Error::Status { detail: 5 }` (DCE/RPC fault 0x00000005) traced
   to MX_TEST_DOMAIN being polluted with the infisical CLI's
   "A new release of infisical is available" upgrade banner. The
   banner was being concatenated onto the domain string by
   Setup-LiveProbeEnv.ps1's `2>&1` capture, causing NTLM Type1 to
   send a malformed domain field that NmxSvc rejected.

2. Fix tools/Setup-LiveProbeEnv.ps1 — Get-InfisicalSecret now splits
   captured output on newlines, filters lines matching the
   "^A new release of infisical is available" / "^Please upgrade"
   banner patterns, and returns the last non-empty line (the actual
   secret value from `infisical secrets get --plain`). Robust to
   future banner messages of similar shape.

3. Fix two drifted line citations in docs/M6-live-verification.md:
   `recover_connection_core (session.rs:1428-...)` is now at line
   1374 after F56/F45/F47 edits — strip the line number, keep the
   function name (`Session::recover_connection_core`). Same for
   `Session::unsubscribe (session.rs:2261)`.

4. Add "Workspace gate (no live infra needed)" subsection to the
   "Reproducing locally" recipe so a fresh contributor sees the
   full V1 verification recipe (live + workspace gate) in one place.

All 5 live tests pass post-fix:
  - F36 buffered subscribe (drained 1 raw NMX message; no scan
    activity on TestChangingInt today, matches 5/6 baseline)
  - F45 buffered recovery replay (2 pre + 2 post DataUpdate frames)
  - F47 buffered unsubscribe skip (returned Ok)
  - F40 metrics smoke (4 expected metric names present)
  - F54 OnWriteComplete (status detail 9 = WRITE_COMPLETE_OK)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 09:17:46 -04:00
Joseph Doherty d668d5b7b1 mxaccess: fix 9 unit tests broken silently by F56's ensure_publisher_connected
Workspace gate sweep flagged 9 unit tests in mxaccess::session that
had been silently failing since F56 landed (commit 5e11b30). Root
cause: F56 added ensure_publisher_connected (issuing
INmxService2::Connect + AddSubscriberEngine before each
AdviseSupervisory) but the in-process fake-NMX-server fixtures'
responses vec sizes weren't bumped. Once the fake server ran out of
responses mid-handshake, the connection was closed and the client
got ConnectionAborted (10053).

Fix: bumped each test's unauthenticated_server / recording_server
response count by 2 to cover the new pair of RPCs. Tests touched:

  - subscribe_then_unsubscribe_round_trip (2 → 4 responses)
  - two_subscribes_produce_distinct_correlation_ids (4 → 6)
  - subscription_stream_yields_data_change_for_matching_correlation (1 → 3)
  - subscription_stream_filters_out_mismatched_correlation_for_status (1 → 3)
  - subscription_stream_keeps_data_update_regardless_of_correlation (1 → 3)
  - subscribe_populates_registry_unsubscribe_clears_it (2 → 4)
  - read_returns_first_data_change_within_timeout (2 → 4)
  - read_returns_timeout_when_no_data_arrives (2 → 4)
  - unsubscribe_skips_un_advise_for_buffered_subscription (2 → 3
    + mid-flow assertion bumped from len()==1 to len()==3)

The two_subscribes test only adds 2 (not 4) extra responses because
the second subscribe hits the per-engine publisher_endpoints cache.

Workspace gate post-fix: 847 tests pass, 0 failed, 9 ignored
(live-only). Clippy + bench clean. Pinned in
docs/M6-live-verification.md "Workspace gate (2026-05-07)" so the
test-fixture lag is recorded for future audits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 04:44:18 -04:00
Joseph Doherty 9ed4700eb4 docs: audit pass — fix stale F-number references
Walked all 18 docs/*.md for stale followup references and outdated
TODO markers. Two real fixes:

docs/M6-buffered-evidence.md:
- Three references to "F45" for the LMX-proxy Suspend/Activate
  Frida instrumentation were stale. That work was actually filed
  as F46 when the followups list got renumbered (F45 was reassigned
  to "Recovery replay should re-issue RegisterReference for
  buffered subscriptions"). F46 landed in commit 808fea1, and the
  follow-up live capture landed as F50 in commit 349e217.
- Updated all three references to point at F46 + F50 + the
  resolution evidence in docs/F50-suspend-activate-evidence.md.
- Renamed the "Sub-followup filed: F45" section to
  "Sub-followup F46 — RESOLVED 2026-05-06" with the verdict from
  the live capture.

docs/M6-live-verification.md:
- "Open work" section listed F50 as a residual gap. F50 closed
  2026-05-06 per docs/F50-suspend-activate-evidence.md. Updated
  to "None. F49 sweep complete; F50 closed".

Other docs scanned, no real staleness:
- Capture-Run-2026-04-25.md, Current-Sprint-State.md,
  DotNet10-Native-Library-Plan.md — historical snapshot docs,
  intentionally pinned to their dates.
- ASB-Native-Integration-Decision.md, MxNativeSession-API.md,
  NMX-COM-Contracts.md, MXAccess-* — describe the .NET reference's
  state; "not yet" wording reflects the .NET planning context, not
  the Rust port's current state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 04:32:28 -04:00
Joseph Doherty 8b50c0fd43 CHANGELOG: curate post-F43 work into V1 entry
The CHANGELOG was cut at F43 and didn't reflect the work that landed
afterwards on the same V1 milestone. Update the V1 [Unreleased] entry
to cover:

Added (since F43):
- F45 — recovery replay re-issues RegisterReference for buffered subs
- F47 — unsubscribe skips UnAdvise for buffered subs
- F49 / F50 / F51 — live verification + Suspend/Activate captures +
  ASB type-matrix expansion with new fixture round-trip tests
- F52.{1,2,3} — codec performance optimisations (BytesMut output,
  thread-local name-signature cache, caller-supplied scratch buffer)
- F54 — per-operation correlation + compat OnWriteComplete fan-out
- F55 — DCOM-managed INmxSvcCallback sink (Path A)
- F56 — Connect/AddSubscriberEngine round-trip in subscribe path
- MxStatus synthesizer kernel ported (settles R3/R4)

Known limitations (post-resolution):
- Drop F45 / F46 / R3+R4 — all resolved.
- Add F53 protocol-crate missing-docs deferral.
- F3 entry now links the new docs/F3-cross-domain-ntlm-recipe.md.

Publish-order section keeps the DAG but flags F48 (no crates.io
publish) up front so anyone reading the recipe knows it's hygiene
not release prep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 04:27:59 -04:00
Joseph Doherty cc99a2d9f0 followups: trim F56's stale pre-resolution analysis
F56's body had a "Resolved 2026-05-06" header followed by ~40 lines
of pre-resolution debugging analysis that contradicted the
resolution: "Likely revised root cause" pointing at DCOM sink IID
mismatches, "But zero 0x33 DataUpdate frames ever arrive", "Action
items for whoever picks F56 up", "Definition of done", "Resolves
when" — all written before the actual root cause (missing
EnsurePublisherConnected round-trip) was identified.

Trim to: status + actual root cause + fix that landed + live
verification + the codec fixes that also landed independently.
The dead-end debugging branches are preserved in this file's git
history for archeology; F56 body now reads as a coherent closeout.

Also fixed line 108's "See Resolved section below for the full
closeout" pointer — the closeout *is* the body; F56 was never moved
to Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 03:31:26 -04:00
Joseph Doherty ddebab2c2d docs: F3 cross-domain NTLM provisioning recipe
Self-contained doc at docs/F3-cross-domain-ntlm-recipe.md for whoever
picks F3 up on hardware with two AD forests + a forest trust. Covers:

- Lab topology (LAB-A resource forest with AVEVA install + LAB-B
  account forest with the probe user, bidirectional forest trust).
- DC + DNS + trust + user provisioning steps (Install-ADDSForest,
  Add-DnsServerConditionalForwarderZone, New-ADTrust, New-ADUser).
- Capture procedure for both the Rust and .NET probes under a
  `runas /netonly` cross-domain token, with Wireshark NTLMSSP guidance.
- Fixture layout under crates/mxaccess-rpc/tests/fixtures/cross-domain-ntlm/.
- Round-trip test skeleton (replay the captured Type 2 → regenerate
  Type 3 → assert byte-equality against the captured Type 3).
- Redaction checklist for the captured bytes.
- Why F3 is "evidence work" not "codec work" — the AV pair parser
  is shape-agnostic, so the codec path is already correct; the
  fixture is a regression net for any future drift.

F3 entry in design/followups.md and R8 in design/70-risks-and-open-questions.md
both now point at the recipe so a future contributor doesn't have
to reconstruct the lab topology from the followup analysis alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 02:40:06 -04:00
Joseph Doherty 73e2bd8771 followups: status snapshot for the Open section
After F52 closed, every entry in the Open section except F3 has a
`**Status:**` line documenting its own resolution (Resolved 2026-05-06,
or Out-of-scope). At a glance the section misleadingly looks like 8
live items.

Add a header snapshot calling out that only F3 — cross-domain NTLM
fixture, externally blocked on a second AD domain — is genuinely open.
The other entries stay where they are because the F-numbers in their
analysis are referenced from other followups; moving them to
`## Resolved` would orphan that context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:57:58 -04:00
Joseph Doherty ceeaeefa71 [F52.3] mxaccess-codec: caller-supplied scratch buffer for write encoder
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Adds `write_message::encode_into_bytes_mut` (and the timestamped
variant) which writes the encoded body into a caller-supplied
`BytesMut`. The buffer is cleared and resized in place each call;
once it has grown to the largest body the session will produce, it
allocates nothing further.

A session that holds a single `BytesMut` and reuses it across writes:

  - Int32 / Float32 / Float64: 2 → 1 allocs/op
    (only the `encode_scalar_value` scratch `Vec<u8>` remains)
  - Boolean: 1 → 0 allocs/op
    (no per-value scratch — the literal payload is a stack `[u8; 4]`)

Bench delta in `design/M6-bench-baseline.md` § F52.3. The
`encode_scalar_value` Vec is the remaining 1 alloc/op for fixed-width
scalars; eliminating it would require inlining the LE-bytes write
into the body slice (left for a follow-up since the F52 spec only
asks for 2 → 1).

Resolves F52 (all three optimisations landed: 4e76b44 F52.1,
a0fa5be F52.2, this commit F52.3). Existing `encode` / `encode_to_bytes_mut`
public surface unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:53:07 -04:00
Joseph Doherty a0fa5bedfd [F52.2] mxaccess-codec: thread-local name-signature cache
Adds a thread-local `HashMap<String, u16>` cache inside
`compute_name_signature`. Repeated calls with the same name (the hot
path inside `MxReferenceHandle::from_names`) skip the `to_lowercase`
allocation and the CRC-16/IBM walk entirely. Bounded at 1024 entries
per thread; on overflow the cache is cleared rather than evicted LRU
— any sane workload re-fills only the names it actively uses.

`MxReferenceHandle::from_names` drops from 2 → 0 allocs/op once warm
(bench delta in `design/M6-bench-baseline.md` § F52.2). Cold-path
behaviour is unchanged: first call with a new name still pays the
`to_lowercase` + cache-key `String` allocations.

Two new tests pin the cache: cache-hit returns the same value as
cold-compute, and cache overflow doesn't break correctness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:50:07 -04:00
Joseph Doherty 4e76b44391 [F52.1] mxaccess-codec: BytesMut output buffer for write encoder
Adds `write_message::encode_to_bytes_mut` (and the timestamped variant)
returning a freshly-allocated `BytesMut`. Allocation count is identical
to `encode` (2 allocs/op for fixed-width scalars); the benefit is
downstream — consumers can `BytesMut::split_to` / `freeze` and forward
the body bytes to a wire-level sink without an intermediate copy.

The body builders (`encode_boolean` / `encode_fixed` / `encode_variable`
/ `encode_array`) were refactored to fill a pre-sized `&mut [u8]`
rather than each allocating their own `Vec<u8>`. The dispatcher
computes the body size up front via small `*_body_size` helpers and
resizes the destination buffer (Vec or BytesMut) once. This is also
the prerequisite refactor for F52.3.

Bench delta in `design/M6-bench-baseline.md` § F52.1; existing
`encode` row unchanged at 2 allocs/op. All 265 round-trip tests
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:46:02 -04:00
Joseph Doherty c7505f9570 [F51] live ASB type-matrix: provision UDAs + capture wire fixtures + round-trip tests
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Provisioned 7 new UDAs on $TestMachine via wwtools/graccesscli
object uda add (then deployed to TestMachine_001):

  TestFloat          MxFloat        scalar
  TestFloatArray     MxFloat        array (4)
  TestDouble         MxDouble       scalar
  TestDoubleArray    MxDouble       array (4)
  TestDateTime       MxTime         scalar
  TestDuration       MxElapsedTime  scalar
  TestDurationArray  MxElapsedTime  array (4)

New crates/mxaccess/examples/asb-type-matrix.rs reads all 14 tags
(7 pre-existing + 7 new) in a single batch and dumps the live
AsbVariant bytes per tag when MX_ASB_DUMP_FIXTURES=<dir> is set.
Single-attempt register (no retry — F31 InvalidConnectionId
cool-down re-arms on every retry, making backoff
counter-productive; if the cool-down is engaged, wait 60+ seconds
without ASB activity then re-run).

Captured live evidence (single cold-start run, all 14 register
calls returned error_code=0x0000):

  TestChangingInt   type_id=4  (Int32)        length=4   payload=4
  TestAlarm001      type_id=17 (Boolean)      length=1   payload=1
  MachineCode       type_id=10 (String)       length=30  payload=30
  TestFloat         type_id=8  (Float)        length=4   payload=4
  TestDouble        type_id=9  (Double)       length=8   payload=8
  TestDateTime      type_id=11 (DateTime)     length=8   payload=8
  TestDuration      type_id=12 (ElapsedTime)  length=8   payload=8

  TestIntArray, TestBoolArray, TestStringArray, TestDateTimeArray,
  TestFloatArray, TestDoubleArray, TestDurationArray
                    type_id=0 length=0 payload=0
                    (provisioned but no value written yet)

Per-tag fixture .bin files saved under
crates/mxaccess-codec/tests/fixtures/f51-type-matrix/ — full
14-byte to 40-byte AsbVariant byte sequences (i32 type_id LE +
i32 length LE + payload bytes).

crates/mxaccess-codec/tests/f51_type_matrix_parity.rs round-trips
each scalar fixture: decode -> re-encode -> assert byte-equal +
type_id / length pin. Tests skip with [skip] message when fixtures
are absent (so the suite passes on a fresh checkout without live
captures). 7 scalar tests pass against the captured fixtures.

Array tags excluded from round-trip pinning because the live
engine returns empty payloads for unwritten arrays. Codec-side
array round-trip is covered by asb_variant's existing synthetic-
payload unit tests.

docs/galaxy-test-fixtures.md inventories all $TestMachine UDAs
(pre-existing + F51-provisioned), the graccesscli provisioning
recipe, the fixture-regeneration pattern, and the F31 cool-down
caveat.

design/followups.md F51 marked resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 15:27:31 -04:00
Joseph Doherty 8bd66bbe65 [F53 measurement] document protocol-crate missing-docs magnitude
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Enabled #![warn(missing_docs)] on each of the 7 protocol crates to
measure how many one-liners filling them in would be:

  mxaccess-asb         422
  mxaccess-nmx         398
  mxaccess-callback    371
  mxaccess-galaxy      229
  mxaccess-codec       205
  mxaccess-rpc         147
  mxaccess-asb-nettcp  111
                     -----
  Total              1883

Reverted the lint enables — most of those are protocol-internal
types (struct fields on wire-shape records, enum variants on opcode
discriminators) whose meaning is already documented at the
consumer-facing layer. Filling 1883 one-liners adds noise without
consumer value, and forcing them as errors via RUSTDOCFLAGS would
block routine cargo doc runs.

design/followups.md F53 entry updated with the measured numbers
and the explicit "stays off indefinitely" verdict. If a future
contributor wants per-crate enforcement, the recipe in the strategy
paragraph (allow(missing_docs) on protocol-internal modules,
warn(missing_docs) on the re-export surface) is still valid.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 13:36:03 -04:00
Joseph Doherty 349e217ea3 [F50] live Suspend/Activate captures — Suspend wires opcode 0x2D, Activate client-side
Re-ran analysis/frida/mx-nmx-trace.js (with the F46 hooks for
LmxProxy.dll!CLMXProxyServer.Suspend / .Activate) against
MxTraceHarness on the local AVEVA install. Two captures landed:

- captures/123-frida-suspend-advised-instrumented/
  Scenario: --scenario=suspend-advised --tag=TestChildObject.ScanState

  After mx.suspend.begin/end at 17:23:51.949Z, NMX PutRequest fires
  ~140ms later with body:
    2d 01 00                                       command 0x2D, version 0x0001
    cd 2a ee ec b2 76 06 4f b4 58 5c a0 2d f7 a8 93 16-byte correlation_id (matches the prior AdviseSupervisory)
    01 00 05 00 01 00 02 00 01 00 69 00 0a 00      engine + handle + attribute / property ids
    47 92 00 00 03 00 00 00                        trailer

  TransferData wraps it; HRESULT 0 returned; ProcessDataReceived
  callback delivers a 50-byte op-status frame; LMX surfaces it
  through CUserConnectionCallback.OperationComplete. Suspend is
  unambiguously server-side wire op 0x2D.

- captures/124-frida-activate-advised-instrumented/
  Scenario: --scenario=activate-advised --tag=TestChildObject.ScanState

  Activate fires at 17:26:02.982Z and returns Success synchronously
  with no NMX traffic. The next NMX activity is 7+ seconds later
  (harness teardown). Activate against a non-suspended item is
  client-side only on this build.

The harness's activate-advised scenario doesn't sequence
Suspend-then-Activate, so we don't have direct evidence for
Activate-after-Suspend. Circumstantial reasoning: since Suspend
goes server-side with a state change, Activate likely also does to
revert. If direct evidence becomes needed, add a new
suspend-then-activate scenario to MxTraceHarness/Program.cs and
re-run.

design/70-risks-and-open-questions.md R5 moves to "settled —
Suspend is wire op 0x2D, Activate behaviour is conditional",
severity downgraded P2 -> P3 (no public Session::suspend /
Session::activate API exists today; if added later, 0x2D is the
encoder target).

design/followups.md F50 marked resolved.

docs/F50-suspend-activate-evidence.md: per-capture byte-level
evidence + repro recipe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 13:29:40 -04:00
Joseph Doherty b62ffc8c5d [F48] mark out-of-scope: internal usage only, no crates.io publish
Maintainer confirmed 2026-05-06 the project is internal-use only —
workspace stays at version "0.0.0", consumers depend via path or
git, not crates.io. F48's actual publish goal is dropped.

design/followups.md F48 entry: replace the "P1 release driver"
framing with "Out of scope" + a pointer to the recipe doc in case
this ever changes.

design/F48-publish-dry-run.md: add a banner at the top explaining
the doc is now retained as a workspace-hygiene record (cargo
package --list per crate produces clean tarballs, no captures or
big files), not as release prep. The "What the actual V1 publish
needs" section reframed as "If a publish ever does become a goal —
recipe" so the steps survive without implying they're scheduled.

No code change. F49 / F53 / F55 / F56 status unchanged — those
weren't release-cut-gated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 13:13:24 -04:00
Joseph Doherty e77db4306a [F48 dry-run] validate publish chain on workspace 0.0.0
cargo publish --dry-run on each of the 9 workspace crates:
- Tier 1 leaves (mxaccess-codec, mxaccess-rpc, mxaccess-asb-nettcp)
  pass cleanly. cargo assembles each tarball, the only failure is
  the dry-run upload abort.
- Tiers 2 + 3 (galaxy, callback, asb, nmx, mxaccess, mxaccess-compat)
  surface the documented "no matching package" registry-lookup
  failure because workspace internal deps are pinned at version
  "0.0.0" which doesn't exist on crates.io. Expected; resolves at
  actual publish time once the leaves are uploaded and indexed.

cargo package --list confirms each crate ships only source + tests
+ small round-trip fixtures. No captures, decompiled binaries, or
accidental big files.

design/F48-publish-dry-run.md captures the per-crate run output,
the per-crate file count, and the V1 publish recipe (bump 0.0.0
→ 0.1.0 across workspace + internal-dep pins, publish in tier
order, wait for indexing between tiers, tag).

design/followups.md F48 entry annotated with the dry-run status.
The actual publish to crates.io is deliberately not done — that
needs maintainer auth + a deliberate version bump that's a release-
cut decision, not a routine validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:42:22 -04:00
Joseph Doherty c606736ec3 [F53 partial] enable #![warn(missing_docs)] on consumer crates
mxaccess + mxaccess-compat now carry #![warn(missing_docs)] at the
crate root. Every public item has at least a one-line doc comment
(struct fields, enum variants, trait methods all covered).

Touched items:
- mxaccess::lib: DataChange fields, SecurityContext fields,
  TransportKind variants, TransportCapabilities fields,
  RecoveryEvent variants + their inner fields, SessionOptions
  fields, the full Error / ConnectionError / AuthError /
  ProtocolError / ConfigError / SecurityError taxonomy + nested
  fields, Transport trait method docs.
- mxaccess-compat::lib: DataChangeEvent / BufferedDataChangeEvent /
  WriteCompleteEvent / OperationCompleteEvent fields.

Protocol crates (codec, rpc, galaxy, nmx, callback, asb,
asb-nettcp) deliberately left without the lint per F53's strategy
paragraph — their consumers (mxaccess + mxaccess-compat) already
document the surfaces they re-export, and forcing one-liners on
every transport-internal item adds noise without consumer value.

Verification:
- `RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps` clean.
- `cargo test --workspace` (824 tests) green.
- `cargo clippy --workspace --all-targets -- -D warnings` clean.

design/followups.md F53 marked partially resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:20:47 -04:00
Joseph Doherty d149143535 [F49 steps 2 + 3] live verification: buffered recovery replay + unsubscribe skip
Step 3 (F47 buffered unsubscribe skip):
- crates/mxaccess-compat/tests/buffered_unsubscribe_skip_live.rs.
- Subscribe buffered, sleep so the engine has DataUpdates in flight,
  then call unsubscribe. Asserts Ok return without surfacing transport
  or HRESULT errors.
- Session::unsubscribe (session.rs:2261) probes the registry: if
  Buffered { .. }, it skips nmx.un_advise entirely, mirroring the .NET
  reference's `if (!subscription.IsBuffered)` guard at
  MxNativeSession.cs:361-381. If unsubscribe accidentally emitted
  UnAdvise for a buffered correlation id, the engine would return
  non-zero HRESULT (no matching plain advise to retract) — surfacing
  as a panic.

Step 2 (F45 buffered recovery replay):
- crates/mxaccess-compat/tests/buffered_recovery_replay_live.rs.
- Subscribe buffered, drain >=1 NMX subscription message
  (cmd=0x32 SubscriptionStatus + cmd=0x33 DataUpdate) to confirm the
  wire path is hot pre-recovery, install a RebuildFactory that calls
  NmxClient::create (the same auto-resolving COM-activation path
  Session::connect_nmx_auto uses), invoke recover_connection, drain
  >=1 NMX subscription message post-recovery.
- Verifies the replay branch in recover_connection_core re-issues
  RegisterReference (NOT AdviseSupervisory) for the buffered entry,
  mirroring MxNativeSession.ReAdviseSubscription (cs:538-569).
  Structural property is unit-tested; this confirms the engine
  actually picks back up after the rebuild + replay.

Both tests pass live on this Galaxy:
  cargo test -p mxaccess-compat --features live-windows-com \
      --test buffered_unsubscribe_skip_live -- --ignored --nocapture
  cargo test -p mxaccess-compat --features live-windows-com \
      --test buffered_recovery_replay_live -- --ignored --nocapture

Pulls mxaccess-nmx + mxaccess-codec into mxaccess-compat dev-deps so
the recovery test can build a RebuildFactory closure that returns
NmxClient and bind a typed broadcast Receiver.

design/followups.md F49 -> Resolved (all five steps pass live).
docs/M6-live-verification.md updated with per-step evidence + repro
commands.

F49 is fully closed out. F55 (DCOM-managed INmxSvcCallback, Path A)
and F56 (missing EnsurePublisherConnected + post-RegisterReference
AdviseSupervisory for buffered) were the two real Rust-port bugs
uncovered along the way; both resolved. Remaining post-V1 followups
(F50 Suspend/Activate Frida, F51 ASB type matrix, F52 perf, F53 doc
lint, etc.) are scoped independently and not part of F49.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:00:44 -04:00
Joseph Doherty 5e11b30507 [F56 resolved] subscribe paths now drive 0x33 DataUpdate frames
Root cause: `Session::subscribe` and `Session::subscribe_buffered_nmx`
were missing the `INmxService2::Connect` + `AddSubscriberEngine` RPC
pair that the .NET reference's `MxNativeSession.EnsurePublisherConnected`
(`cs:516-526`) issues before the first advise against a publishing
engine. Without those two RPCs, NmxSvc accepted the subscription
registration but the publishing engine never knew our engine was
subscribed — so it never dispatched DataUpdate frames back.

Diagnosis driven by wwtools/aalogcli reading
C:\ProgramData\ArchestrA\LogFiles. The user pointed at this tooling
which lit up the path.

Red herring: NmxSvc's `[Warning] NmxCallback->DataReceived ... failed
with error 0x{N}` log lines turned out to be normal log spam where N
is the bufferSize of the inbound call, not a real error code. The
.NET reference's own probe triggers identical entries while still
receiving DataUpdate frames successfully.

Fix:
- SessionInner::publisher_endpoints — per-session HashMap<(platform_id,
  engine_id), ()> cache mirroring MxNativeSession._publisherEndpoints.
- Session::ensure_publisher_connected — issues Connect +
  AddSubscriberEngine, once per publisher endpoint per session.
- Session::subscribe + subscribe_buffered_nmx — both call it before
  the wire advise.
- subscribe_buffered_nmx — additionally issues AdviseSupervisory after
  RegisterReference. The .NET reference's RegisterBufferedItemAsync
  only calls RegisterReference, but on this AVEVA install
  RegisterReference alone produces the registration result + heartbeat
  callbacks without ever starting DataUpdate dispatch; AdviseSupervisory
  unblocks the dispatch.

Live verification (`TestMachine_001.TestChangingInt`, a tag that
updates >1×/s):
  cargo test -p mxaccess-compat --features live-windows-com \
      --test plain_subscribe_live -- --ignored --nocapture
  cargo test -p mxaccess-compat --features live-windows-com \
      --test buffered_subscribe_live -- --ignored --nocapture
Both pass — `cmd=0x32` SubscriptionStatus + sequence of `cmd=0x33`
DataUpdate frames flow as expected. Tests assert on the raw
Session::callbacks() broadcast (not the typed Subscription::next
DataChange path) because the engine reports quality=Uncertain
value=null for this attribute on this Galaxy — the wire-level
subscription is what F56 was about, not the value content.

DcomCallbackSink reverted to S_OK return for both DataReceivedRaw
and StatusReceivedRaw (the bytes-processed / sentinel HRESULT
experiments during diagnosis turned out to be irrelevant — the
"failed with error 0xN" logs come from NmxSvc regardless of the
return value).

design/followups.md F49 + F56 + docs/M6-live-verification.md updated:
F56 resolved, F49 steps 1 + 4 + 5 pass live, steps 2 + 3 pending
(now executable on this fixture).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:32:07 -04:00
Joseph Doherty c6332c26a1 [F49 step 4 + step 5 + doc] live evidence: metrics smoke pass, M6-live-verification.md
F49 step 4 (F40 metrics smoke):
- crates/mxaccess-compat/tests/metrics_smoke_live.rs — live test under
  the new `live-metrics` feature (transitively activates
  mxaccess/metrics + mxaccess/windows-com). Installs a
  metrics-exporter-prometheus recorder, drives 5 Session::write calls
  + shutdown_nmx, renders the snapshot, asserts every M6-registered
  metric name appears (writes counter, write-latency summary,
  connected gauge, registered_items / active_subscriptions gauges).
  Pass on the live AVEVA install.

  Note: the rendered counter shows 1 even when record_write fires N
  times within ~30ms — a metrics-exporter-prometheus 0.16 quirk under
  tight loops, not a Rust port bug. Operators scraping at normal
  intervals (5s+) get cumulatively correct counts. Documented in the
  test + in M6-live-verification.md so future runs aren't surprised.

F49 status update (in design/followups.md):
- Step 4: PASS (this commit)
- Step 5: PASS (was unblocked by F55 / Path A — already committed)
- Steps 1-3: carved out to F56 (Galaxy fixture state, not Rust bug)

docs/M6-live-verification.md:
- Per-step evidence table with test invocations + outcomes.
- Sample Prometheus snapshot for step 4.
- Reproduction commands for the live tests.
- F56 explanation cross-referenced from step 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 10:36:09 -04:00
Joseph Doherty df3457c54a [F56] subscribe / subscribe_buffered: split-form wire body + diagnose Galaxy fixture gap
Three real fixes + one architectural diagnosis:

1. Session::subscribe_buffered_nmx now sends the .NET-reference split
   form on the wire:
     item_definition = "<attr>.property(buffer)"   (was: full reference)
     item_context    = "<object_tag_name>"          (was: empty)
     item_handle     = SessionInner::next_item_handle.fetch_add(1)
                       (was: hardcoded 0)
   Verified byte-identical against captures/082 + 094 by the existing
   buffered_register_reference_parity unit tests. The
   item_handle counter mirrors MxNativeCompatibilityServer's
   _nextItemHandle++ at MxNativeSession.cs:613.

2. New live tests:
   - tests/buffered_subscribe_live.rs (F49 step 1) — uses real Galaxy
     metadata via SqlTagResolver + connect_nmx_auto, drives a
     background writer at 500ms cadence to force value-changes,
     drains DataChange events from Subscription.
   - tests/plain_subscribe_live.rs — same harness over plain
     Session::subscribe (NOT buffered), used to isolate whether
     "no DataUpdate" is buffered-specific (it's not — both fail).

   Both pull tracing-subscriber as a dev-dep so `RUST_LOG=trace`
   surfaces dcom_sink + router activity.

3. mxaccess-galaxy/sql_resolver.rs: drop the inner-attribute
   `#![cfg(feature = "galaxy-resolver")]` — the module-level cfg on
   `pub mod sql_resolver` in lib.rs already handles this and Rust
   1.85's clippy::duplicated_attributes lint flagged the duplicate
   once mxaccess-compat dev-deps activated the feature.

4. F56 finding (diagnosis, NOT a bug fix): the engine on this Galaxy
   install does not have an active value for TestChildObject.TestInt.
   Confirmed by running the .NET reference's own probe:

     dotnet run --project src/MxNativeClient.Probe -c Release \
       -- --probe-session-subscribe --tag=TestChildObject.TestInt \
       --subscribe-hold-seconds=10

   ...returns ONE 0x32 SubscriptionStatus (status=3 detail=3
   quality=0x00C0 Uncertain value=null) and zero 0x33 DataUpdates —
   matching the Rust port's symptom exactly. Not a Rust port bug,
   not a wire-byte gap. F49 steps 1-3 need either an actively-
   scanned tag or local Galaxy reconfiguration to scan
   TestChildObject.TestInt.

Workspace tests + clippy clean under both feature configurations.
F56 entry in design/followups.md updated with the full diagnostic
chain so future-me / future-collaborators can pick it up without
re-tracing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 10:27:08 -04:00
Joseph Doherty af15fe7587 [F49 step 1 + F56] callback router: peel envelope before parsing subscription / 0x11 frames
The router used to call NmxSubscriptionMessage::parse_inner directly
on the COM-stub-delivered body, but the wire bytes arrive wrapped in
a ProcessDataReceived envelope (46-byte header + optional 4-byte
length prefix); parse_inner expects post-envelope bytes. Result:
every 0x33 DataUpdate that ever arrived was silently dropped.

Mirrors the .NET reference's MxNativeSession.OnCallbackReceived flow
at cs:582-606 — three sequential parse attempts:
  1. NmxOperationStatusMessage::try_parse_process_data_received_body  (already wired)
  2. NmxReferenceRegistrationResultMessage::try_parse_...              (NEW — was missing)
  3. NmxSubscriptionMessage::try_parse_process_data_received_body      (NEW — was wrong)

Adds:
- NmxSubscriptionMessage::try_parse_process_data_received_body — peels
  envelope via NmxObservedEnvelope::parse_process_data_received_body_flexible,
  then dispatches to existing parse_inner.
- NmxReferenceRegistrationResultMessage::try_parse_process_data_received_body —
  same shape, for the 0x11 registration-result frame.
- Router branch for 0x11 — currently traces the assigned item_handle and
  drops the frame (matches the .NET reference, which fires a
  ReferenceRegistrationReceived event with no consumer in the codebase).
- Router fall-through trace! when neither path matches, so future
  unparseable bodies surface in RUST_LOG=trace instead of vanishing.
- DcomCallbackSink::forward — trace! per inbound callback so
  RUST_LOG=mxaccess_callback=trace surfaces opnum + size.
- crates/mxaccess-compat/tests/buffered_subscribe_live.rs — F49 step 1
  live test that drives subscribe_buffered + a 500ms-cadence writer.
  Also pulls tracing-subscriber as a dev-dep so the test can dump
  router activity.

Existing router_task_decodes_callback_invoked_into_broadcast unit test
updated to wrap its synthetic 0x32 body in an envelope so the new
parse path actually accepts it.

Live result: F56 — the buffered round-trip *registers* successfully
(RegisterReference returns HRESULT 0; engine sends one 0x11
RegistrationResult + one 51-byte op-status per write, perfectly
clocked) but the engine never sends a 0x33 DataUpdate. Rust-port-
specific gap vs the .NET reference's working buffered path; root
cause is likely a field-level difference in the RegisterReference
body or a missing post-RegisterReference step. Captured as F56 in
design/followups.md, blocking F49 step 1; F56's DoD is the same
live test reporting >=3 DataChange arrivals.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 09:50:57 -04:00
Joseph Doherty 2fc327a8d5 [F55 Path A] DCOM-managed INmxSvcCallback sink
Replace the hand-rolled CallbackExporter (TCP listener + custom
OBJREF) with a real `windows-rs` `#[implement]` COM class for
INmxSvcCallback, marshalled via CoMarshalInterface. NmxSvc validates
the callback OBJREF by calling IObjectExporter::ResolveOxid against
the local RPCSS at 127.0.0.1:135; hand-rolled OXIDs aren't registered
there, which is why RegisterEngine2 returned RPC_S_SERVER_UNAVAILABLE
(1722) on every live attempt. CoMarshalInterface registers the OXID
with RPCSS automatically, so the SCM-side resolution succeeds.

Mirrors MxNativeSession.CreateRegisteredService (cs:624), which is
the .NET reference's working path:
  ComObjRefProvider.MarshalInterfaceObjRef(callback,
    INmxSvcCallback, DifferentMachine)

Layout:
- mxaccess-callback::dcom_sink — INmxSvcCallback + DcomCallbackSink
  + create_dcom_callback_sink_objref. Forwards inbound calls into
  the same CallbackEvent::CallbackInvoked { opnum, body } shape the
  legacy exporter produces, so callback_router stays path-agnostic.
- Session::from_nmx_client — branched on `windows-com`. Real DCOM
  sink when on; legacy CallbackExporter when off (kept for unit
  tests that run against an in-process fake NMX peer).
- SessionInner.dcom_sink_holder: Option<IUnknownHolder> — keeps the
  COM ref alive for the session's lifetime; shutdown_nmx drops it.
- mxaccess-rpc + mxaccess-callback: windows-rs 0.59 → 0.62. The 0.59
  #[implement] macro generates code that doesn't compile under
  edition 2024; 0.62 is fixed.

Live result: cargo test -p mxaccess-compat --features
live-windows-com --test lmx_write_complete_live -- --ignored
--nocapture passes end-to-end. RegisterEngine2 OK, write
round-trips, OnWriteComplete fires with the captured MxStatus shape.

Unblocks F49 step 5; F55 marked Resolved in design/followups.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 09:25:44 -04:00
Joseph Doherty 0a274af76f [F55] Path C investigation: NmxSvc requires SCM-registered OXID for callbacks
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Captured OBJREF byte structures from both paths via the .NET probe:
- `--probe-callback-marshal`: DCOM-marshalled, 338 bytes, succeeds
  (when used inside `MxNativeSession.Open` → `CreateRegisteredService`).
- `--probe-register-managed-callback`: hand-rolled, 162 bytes, fails
  with `RegisterEngine2 → 0x800706BA RPC_S_SERVER_UNAVAILABLE`.

The structural diff:
- `std_flags`: DCOM=`0x0A80` (SORF_OXRES4+6+8) vs hand-rolled=`0x280`
  (SORF_OXRES4+6). Bit `0x0800` (SORF_OXRES8) only set in DCOM.
- ncacn_ip_tcp bindings: DCOM=4 with no ports; hand-rolled=1 with
  explicit `[port]`.
- Total size: 338 vs 162 bytes.

Tested the simplest fix (hand-rolled `std_flags = 0x0A80` to match
DCOM): **still fails with the same 1722.** Reverted.

**Diagnosis updated in F55:** NmxSvc on receiving RegisterEngine2
appears to call `IObjectExporter::ResolveOxid` against the local
SCM (`127.0.0.1:135`) to resolve the callback OBJREF's OXID, then
dial the resulting bindings. Our hand-rolled OXID is never
registered with RPCSS, so the SCM-side resolution fails and NmxSvc
returns RPC_S_SERVER_UNAVAILABLE — matching:
- the symptom (1722),
- the sub-second timing (no TCP dial-back to our listener attempted),
- the fact that the .NET `ManagedCallbackExporter` (same hand-rolled
  approach) ALSO fails identically.

DCOM marshalling fixes this because `CoMarshalInterface` internally
registers the OXID with RPCSS. The bindings have no port because
RPCSS returns the dynamic port from the DCOM stub layer.

**Conclusion: Path A is the architecturally correct fix** — the
callback exporter must be a DCOM-managed object (e.g. via
`windows-rs` `#[implement]`) for NmxSvc to accept the callback.
The hand-rolled-listener-with-explicit-port approach is
fundamentally incompatible with NmxSvc's callback validation, in
both Rust and the .NET reference.

Path C (cheap investigation) is exhausted; F55 verdict updated to
recommend Path A explicitly.

`cargo test --workspace` 824 passing; clippy `-D warnings` clean
across both feature configurations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 08:55:59 -04:00
Joseph Doherty c5d611d6fa [F12 partial + F55] hold IUnknown for client lifetime + diagnose RegisterEngine2 1722
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
**F12 partial improvement** (`mxaccess-rpc::IUnknownHolder` + `mxaccess-nmx`):

- New `IUnknownHolder` newtype that owns an MTA-resident COM proxy
  with `unsafe impl Send + Sync`. Mirrors the .NET reference's
  `ManagedNmxService2Client._activatedComObject` private field
  (`cs:15`).
- New `activate_and_marshal_iunknown_objref(prog_id, ctx)` returns
  `(Vec<u8>, IUnknownHolder)`. Existing
  `marshal_activated_iunknown_objref` retained as a wrapper that
  drops the holder (kept for inline-use callers).
- `NmxClient` gains an `activated_com_object: Option<IUnknownHolder>`
  field, populated by `Self::create` from the new helper.
  `Self::connect` / `Self::from_bound_transport` set it `None` (no
  COM activation in those paths).
- Holding the IUnknown for the client's lifetime keeps the
  SCM-tracked OXID valid; without it the COM ref count drops to
  zero and the SCM may release the activated server-side instance,
  making subsequent `ResolveOxid` / `RemQueryInterface` calls
  return `RPC_S_SERVER_UNAVAILABLE`.

**F55 (new) — hand-rolled callback exporter rejected by RegisterEngine2**

Five-step instrumentation of `Session::connect_nmx_auto` proves all
six COM-activation / RemQI / final-bind steps succeed. The 1722
fault originates at `RegisterEngine2` itself:

```
from_nmx_client: callback hostname="DESKTOP-6JL3KKO" port=57886 obj_ref_len=162
from_nmx_client: callback obj_ref hex: 4d454f57010000...
from_nmx_client: RegisterEngine2 (31112, mxaccess.31112)
from_nmx_client: RegisterEngine2 FAIL: Transport(Fault { status: 2147944122 })
```

Status `0x800706BA` = `RPC_S_SERVER_UNAVAILABLE` wrapped as Win32
HRESULT.

**Critical finding: the .NET reference's `--probe-register-managed-callback`
(which uses the same hand-rolled `ManagedCallbackExporter` approach
as the Rust port) ALSO fails with the same `0x800706BA` fault.**
Only `--probe-session-write`, which uses
`ComObjRefProvider.MarshalInterfaceObjRef(callback, ...)` to build
the OBJREF via Windows DCOM proxy/stub marshalling, succeeds. So
this is an architectural artifact of the hand-rolled-callback
design, not a Rust port regression.

`design/followups.md` F55 entry documents the three resolution
paths (switch to DCOM-marshalled callback / hybrid / continue
investigating OBJREF rejection at NmxSvc).

F49 stays open with a refined diagnostic — the per-feature live
verification is gated on F55's resolution.

Workspace tests still 824 passing; clippy `-D warnings` clean
across both feature configurations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 08:50:30 -04:00
Joseph Doherty e5b31fadb1 [F49] live-test scaffolding for F54 OnWriteComplete + COM probe diagnostic
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Live attempt against AVEVA on this dev host produced two artefacts:

**`crates/mxaccess-compat/tests/lmx_write_complete_live.rs`** — the
F54 OnWriteComplete round-trip test. Compiles + runs against the
live AVEVA install via either path:
- `--features live-windows-com` (preferred): uses
  `Session::connect_nmx_auto` so the COM activation reference is
  held in-process for the duration of the test.
- Default features (fallback): shells out to
  `MxNativeClient.Probe --probe-resolve-oxid-managed-ntlm-integrity`
  + `--probe-remqi-managed` to learn the per-session NMX endpoint +
  INmxService2 IPID, then uses `Session::connect_nmx`.

Both code paths are wired and the test runs through endpoint
resolution + IPID extraction successfully. The connect step itself
fails with `Status { detail: 1722 }` (RPC_S_SERVER_UNAVAILABLE).

**`crates/mxaccess-rpc/examples/com-marshal-probe.rs`** — minimal
one-shot binary that calls
`marshal_activated_iunknown_objref("NmxSvc.NmxService",
DifferentMachine)` in isolation. Confirms the COM activation +
CoMarshalInterface chain works fine standalone (returns a 338-byte
OBJREF with valid OXID/IPID structure). The 1722 in the live test
is therefore downstream of the activation — likely a COM-apartment
threading interaction with the tokio multi-thread runtime.

This is an F12-related issue (auto-resolve hardening), not an F54
issue. F54's correctness is covered by the existing unit-level
integration tests:
- `mxaccess::session::tests::router_populates_operation_status_context_from_pending_ops_fifo`
- `mxaccess::session::tests::write_handle_correlates_with_router_emitted_status`
- `mxaccess_compat::tests::drain_routes_write_status_to_on_write_complete`
- `mxaccess_compat::tests::drain_routes_non_write_status_to_on_operation_complete`

`design/followups.md` F49 entry updated to reflect:
- F54 added as a fifth row in the live-verification scope.
- "Live attempt 2026-05-06" sub-section documents the 1722 issue +
  what was verified (.NET probe end-to-end works against same
  install; Rust COM activation works in isolation; the failure is
  Rust-port-specific to `connect_nmx_auto` under tokio).
- F49 now Blocked-by F12 hardening (the 1722 path).

New `live-windows-com` feature on `mxaccess-compat` propagates to
`mxaccess/windows-com` for the test binary.

Workspace 824 → 824 tests; clippy + rustdoc clean across both
feature configurations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 08:23:01 -04:00
Joseph Doherty 04c10babfb [F54 test] end-to-end smoke: write_with_handle ↔ callback_router boundary
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Adds `write_handle_correlates_with_router_emitted_status` — the
closest-to-live test we can write without an AVEVA endpoint, pinning
the F54 boundary the C# `OnWriteComplete` callback ultimately depends
on.

The existing tests cover the layers individually:
- `write_value_with_handle_inserts_into_pending_ops` — write API
  populates pending_ops with the right correlation id.
- `router_populates_operation_status_context_from_pending_ops_fifo`
  — callback_router consumes a frame + the registry, emits a typed
  OperationStatus with context attached.
- `drain_routes_write_status_to_on_write_complete` (mxaccess-compat)
  — drain function routes Write op_kind to on_write_complete_tx.

What was missing: a test that combines the public `write_value_with_
handle` API with a real callback_router invocation against the SAME
`pending_ops` Arc the write populated. The new test:

1. Builds a Session via `connect_test_session`.
2. Calls `session.write_value_with_handle("TestObj.TestInt", ...)` —
   gets a real `WriteHandle { correlation_id }` and a real entry in
   `pending_ops` (no manual insertion).
3. Spins a parallel `callback_router` over the SESSION's
   `pending_ops` Arc + a fake event_tx (the live exporter's
   internal channel isn't reachable from tests; this is the
   established workaround pattern from
   `router_task_decodes_callback_invoked_into_broadcast`).
4. Injects the proven `WRITE_COMPLETE_OK` 5-byte frame.
5. Asserts the emitted `OperationStatus.context.correlation_id`
   equals the cid the write returned, that op_kind is Write, that
   reference is the original tag string, and that
   `pending_ops` is now empty (one-shot popped).

This closes the integration-test gap the user flagged. Live AVEVA
verification still falls under F49.

Workspace 823 → 824 tests; clippy + rustdoc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 07:57:15 -04:00
Joseph Doherty 4ff511bbed [F54] per-operation correlation + compat OnWriteComplete fan-out
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Closes the residual that R3/R4 Path A's commit `c73a33e` deferred:
the OperationStatus.context field was always None because no
in-flight correlation map existed in SessionInner, and the
mxaccess-compat broadcast channels for OnWriteComplete /
OperationComplete were exposed on the public API but had no
fan-out task draining session events into them.

**mxaccess (Part 1 — per-operation correlation):**

- New `pending_ops: Mutex<HashMap<[u8; 16], OperationContext>>` on
  SessionInner. Populated when `Session::write*` / `subscribe*`
  dispatches an outstanding operation; entry removed when the
  matching OperationStatus event fires (one-shot semantics).
- New `Session::write_with_handle` (and equivalents for the secured /
  timestamped paths) returns a `WriteHandle { correlation_id }` so
  consumers can correlate completions back to their originating
  call. Existing `write` / `write_value` / etc. signatures unchanged
  and delegate to the handle-returning variant.
- Callback router extended to look up `pending_ops` by correlation_id
  on each operation-status event. When found, populates
  `OperationStatus.context: Some(OperationContext { correlation_id,
  op_kind, reference, retry_count: 0 })`. When not found, falls
  through with `context: None` (verbatim-preserve per CLAUDE.md).
- New unit tests assert: matching correlation_id populates context,
  unknown correlation_id leaves context None, the entry is removed
  from `pending_ops` after one event fires.

**mxaccess-compat (Part 2 — compat-layer fan-out):**

- New `correlation_to_item: tokio::sync::Mutex<HashMap<[u8; 16], i32>>`
  on LmxClientInner.
- `LmxClient::write` / `write_2` / `write_secured` / `write_secured_2`
  call `Session::write_with_handle` (or equivalent) and insert
  `correlation_id → item_handle` into the map before returning.
- `LmxClient::register` / `register_asb` spawn a background task that
  drains `session.operation_status_stream()`. Per event, looks up
  `correlation_to_item[event.context?.correlation_id]` to find the
  item_handle, then routes:
  - `OperationKind::Write` / `OperationKind::WriteSecured` →
    `WriteCompleteEvent { server_handle, item_handle, statuses,
    is_during_recovery }` into `on_write_complete_tx`.
  - Other variants → `OperationCompleteEvent { ... }` into
    `on_operation_complete_tx`.
  - Removes the correlation_id from `correlation_to_item` after
    firing (one-shot).
- Events with no matching item_handle (correlation_id not in map)
  are dropped silently — no bogus item_handle=0 events.
- Task cancelled on LmxClient drop via `JoinHandle::abort` (matches
  the existing `subscription_task` pattern).
- New unit tests cover: Write op routes to on_write_complete, Read
  op routes to on_operation_complete, unknown correlation_id is
  dropped.

Result: the C# `LMX_OnWriteComplete(int hLMXServerHandle, int
phItemHandle, ref MXSTATUS_PROXY[] pVars)` callback shape is now
end-to-end-achievable. A consumer calls `LmxClient::write(hServer,
hItem, value, userId)` and drains `client.on_write_complete()`; the
yielded `WriteCompleteEvent` carries the right `(server_handle,
item_handle, statuses, is_during_recovery)` tuple.

Public API: `Session::write_with_handle` + `WriteHandle` are new;
existing signatures unchanged. `cargo public-api` baselines
regenerated under `design/public-api/{mxaccess,mxaccess-compat}.txt`.

Workspace: 765 → 823 tests pass (~58 new tests from F54). Clippy
`-D warnings` clean. Rustdoc `-D warnings` clean.

F54 status in `design/followups.md` moved Open → Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 07:41:28 -04:00
Joseph Doherty f98ab9846d design/70-risks: record the .NET reference's WriteCompleted half-implementation
R3's verdict gains an aside documenting why the original native
MxAccess `OnWriteComplete` event has historically only fired for the
one exact 5-byte pattern `00 00 50 80 00` (= `MxStatus.WriteCompleteOk`).

Verified at:
- `src/MxNativeClient/MxNativeCompatibilityServer.cs:756` —
  `if (!evt.Message.IsMxAccessWriteComplete) return;` gates the
  consumer-facing `WriteCompleted` event.
- `src/MxNativeCodec/NmxOperationStatusMessage.cs:18` —
  `IsMxAccessWriteComplete` requires
  `Format == StatusWord && StatusCode == 0x8050 && CompletionCode == 0x00`.

Every other completion frame is silently dropped — the 1-byte
`0x00`/`0x41`/`0xEF` ones, plus any non-success status word.

This was the underlying reason R3/R4 looked unsolvable for a year:
the answer "we don't know how to map" was actually "the native
compatibility shim deliberately doesn't map these because firing
typed failure events on ambiguous bytes was never a goal."

Path A's `MxStatus::from_packed_u32` (commit `c73a33e`) closes the
asymmetry on the Rust side: `Session::operation_status_events()`
exposes ALL typed outcomes the upstream synthesizer produces, not
just the WriteCompleteOk slice. The Rust port now has strictly
broader operation-status visibility than the .NET reference offered.

Recorded so future contributors don't re-derive this from scratch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 07:13:28 -04:00
Joseph Doherty c73a33edd8 [R3/R4 Path A] mxaccess: port Lmx.dll FUN_10100ce0 synthesizer kernel
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
Path A landed for R3/R4. The byte->MxStatus synthesizer in Lmx.dll is
FUN_10100ce0 (`analysis/ghidra/exports/Lmx.dll.synthesizer-helpers2-decompile.md`),
a 4-byte u32 LE -> 4-tuple MxStatus decoder used by every NMX-frame
parser in Lmx.dll. The kernel is byte-deterministic and context-free,
so it ports as a pure function -- the operation-tracking state
machine the original verdict deferred is NOT required for synthesis.

Bit layout (per FUN_10100ce0 lines 21-24):
  bit 31:        success    (-1 if set, 0 if clear)
  bits 27..24:   category   (4 bits)
  bits 23..20:   detected_by (4 bits)
  bits 15..0:    detail     (i16 -- low 16 bits, signed)
  bits 30..28, 19..16: reserved/padding

Codec changes:
- MxStatus::from_packed_u32() / ::to_packed_u32() -- the kernel +
  inverse for round-trip parity.
- MxStatus::from_nmx_response_code() -- the constructed-from-response-
  code switch in FUN_1010bd10:741-770 (six proven mappings: 0x01, 0x02
  -> CommunicationError + RequestingNmx; 0x03 -> ConfigurationError +
  RequestingNmx; 0x04 -> ConfigurationError + RespondingNmx; 0x05 ->
  CommunicationError + RespondingNmx; 0x1A -> CommunicationError +
  RequestingNmx).
- MxStatusCategory / MxStatusSource: from_i16/to_i16 promoted to const
  fn so MxStatus::from_packed_u32 can be const.
- NmxOperationStatusMessage::try_parse_process_data_received_body() --
  thin wrapper that peels the outer NmxObservedEnvelope before
  delegating to try_parse_inner. Mirrors
  NmxOperationStatusMessage.TryParseProcessDataReceivedBody (.NET cs:20-32).
- NmxOperationStatusMessage::promote_to_typed() -- entry point that
  returns the existing Status field. Documented as a no-op pass-through
  for now (the 5-byte inner-body wire shape is NOT the same field as
  the 4-byte packed-u32 the kernel decodes); kept for API symmetry.
- 22 new round-trip tests covering the kernel, the response-code
  switch, the proven 0x00/0x41/0xEF completion bytes, and round-trip
  for every canonical sentinel.

mxaccess (Session) changes:
- New OperationKind enum (Write/WriteSecured/Read/Subscribe/
  Unsubscribe/Activate/Suspend/Other).
- New OperationContext struct (correlation_id, op_kind, reference,
  retry_count) -- ground for the F54 follow-on per-operation
  correlation work.
- New OperationStatus event type {raw, status, context,
  is_during_recovery}, mirroring MxNativeOperationStatusEvent (cs:73-78)
  with the typed-MxStatus addition.
- Session::operation_status_events() -> broadcast::Receiver<Arc<
  OperationStatus>> + operation_status_stream() Stream variant.
- callback_router() now tries operation-status parsing first, falling
  through to subscription messages -- matches MxNativeSession
  .OnCallbackReceived dispatch order (cs:574,582,590).
- recover_connection() flips a recovery_active counter (Arc<AtomicU32>
  shared with the router) so OperationStatus.is_during_recovery is
  populated correctly. Mirrors MxNativeSession._recoveryActive
  Volatile.Read at cs:573.
- 3 new router tests covering: status-word frame dispatch + typed
  promotion to WriteCompleteOk; completion-only frames stay verbatim;
  is_during_recovery is stamped from the live counter.

Per-operation context tracking (correlating completion frames back to
outstanding writes/subscribes via the correlation_id) is filed as F54
in design/followups.md. The synthesizer kernel itself is byte-
deterministic, so the kernel and the correlation work are decoupled.

Ghidra evidence (the next-ring xref walk beyond FUN_10114a90):
- analysis/ghidra/exports/Lmx.dll.set-attribute-result-xrefs.md --
  xrefs to OnSetAttributeResult / CancelWithStatus / OperationComplete.
- analysis/ghidra/exports/Lmx.dll.vtable-data-xrefs.md -- vtable-slot
  data xrefs for the virtual-dispatch path.
- analysis/ghidra/exports/Lmx.dll.synthesizer-decompile.md --
  ScanOnDemandCallback::OperationComplete/MultipleOperationComplete
  (FUN_1010b990), RemotePlatformResolver::OperationComplete
  (FUN_1010dc80), and the constructed-from-responseCode synthesizer
  in FUN_1010bd10 (lines 698-770). FUN_1010bd10 is the wire-frame
  receiver that drives the synthesis.
- analysis/ghidra/exports/Lmx.dll.synthesizer-helpers-decompile.md --
  FUN_10003fc0 (the <success %d category %d ...> formatter; confirms
  the 4-tuple layout), FUN_1008f150 (dispatch helper).
- analysis/ghidra/exports/Lmx.dll.synthesizer-helpers2-decompile.md --
  FUN_10100ce0 (the kernel itself), FUN_10100bc0 (3xu16 reader),
  FUN_1005e580 (4-byte stream reader), FUN_1010ee00 (sister NMX-frame
  parser using the same kernel).
- analysis/ghidra/exports/Lmx.dll.synthesizer-callers-xrefs.md --
  caller graph; confirms the kernel is called from many wire-frame
  parsers but each parser shares the single 4-byte decoder.

R3/R4 verdict updated in design/70-risks-and-open-questions.md from
"settled at verbatim-preserve" to "settled per Path A". F54 filed in
design/followups.md for the per-operation correlation work.

cargo build / test / clippy -D warnings / RUSTDOCFLAGS=-D warnings doc
all clean. cargo public-api baselines regenerated for mxaccess and
mxaccess-codec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 07:08:36 -04:00
Joseph Doherty 460c61df43 [R3/R4] Path-A trace: synthesizer is in Lmx.dll's NMX-frame decoder
Five-stage Ghidra headless decompile traces the byte-to-MXSTATUS_PROXY
synthesis path end-to-end across LmxProxy.dll and Lmx.dll. New evidence
files committed alongside R3/R4 verdict update:

- analysis/ghidra/exports/LmxProxy.dll.fire-event-xrefs.md
- analysis/ghidra/exports/LmxProxy.dll.status-synthesis-decompile.md
- analysis/ghidra/exports/LmxProxy.dll.mxstatus-safearray-decompile.md
- analysis/ghidra/exports/Lmx.dll.set-attribute-result-decompile.md

Layer-by-layer findings (bytes flow inward; synthesis flows outward):

1. `Lmx.aaDCT` at 0x10178fc0 is `SysAllocString(L"Lmx.aaDCT")` — a
   tracing category BSTR, not a table.
2. `MXSTATUS_PROXY` is a 16-byte marshalled struct (4 × i16 padded
   to i32 boundaries with Pack=4) — the OUTPUT of synthesis, not a
   lookup entry.
3. `LmxProxy.dll` Fire_* event handlers receive already-populated
   `MXSTATUS_PROXY[]` and forward through ATL dispatch — no synthesis.
4. `LmxProxy.dll` Fire_* CALLERS (FUN_1001657f / FUN_10016b50 /
   FUN_10016d4b) call FUN_10003f60(out_safearray, in_status_ptr,
   count=1) which is a VERBATIM memcpy of an existing 14-byte buffer
   into the SAFEARRAY — no transformation.
5. `Lmx.dll`'s `PreboundReference::OnSetAttributeResult` (FUN_10114a90)
   receives an already-populated `short *param_7` status buffer. Log
   line confirms the layout: `<success %d category %d detectedBy %d
   detail %d>`. Dispatches on typed values — synthesis is upstream of
   this function too.

The synthesizer is the NMX-frame decoder in Lmx.dll that calls
OnSetAttributeResult / OnGetAttributeResult / equivalent
OperationComplete handler. The decoder takes raw NMX bytes plus
operation context (item handle, engine state, retry state,
correlation id) and computes the populated MXSTATUS_PROXY. There is
NO static lookup table — synthesis is per-message contextual.

Two viable paths to typed promotion (both substantial; neither a
small codec patch):

- Path A: port the synthesizer. ~1-2 weeks. Requires extending the
  Rust session to track per-operation context (handles, retries,
  correlation ids). Out of V1 scope.
- Path B: empirical capture pairs. ~30 min × 6-10 scenarios. Output
  is a (byte, context → status) mapping that approximates without
  re-implementing. Risk: mapping is only valid for captured contexts.

R3/R4 stay settled at verbatim-preserve. The .NET reference does
the same for the same reason: the synthesizer is too context-
dependent to mirror without porting the entire operation-tracking
state machine.

Reopen criteria sharpened: either (a) a consumer files a concrete
use case for typed promotion of a specific byte+context combination
(Path B's empirical capture for that one combination is the cheapest
answer); or (b) a major-version bump justifies the state-machine
port (Path A).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:33:02 -04:00
Joseph Doherty 4dfc0cee65 [R3 + R4 + R8] settle protocol-level risks per Ghidra evidence
Ghidra headless decompile of `Lmx.dll`'s `aaDCT` symbol + the
`LmxProxy.dll` Fire_* event handlers (logs at
`analysis/ghidra/exports/Lmx.dll.aadct-decompile.md` and
`analysis/ghidra/exports/LmxProxy.dll.completion-status-decompile.md`)
settles **R3** and **R4** as "no static byte→status lookup table
exists":

- `Lmx.aaDCT` at `0x10178fc0` is a `SysAllocString(L"Lmx.aaDCT")` into
  a global BSTR — a logging category name, not a table.
- `MXSTATUS_PROXY` is a 4-field struct (success/category/detectedBy/
  detail), used as the marshalled COM event payload — not a static
  array of pre-mapped statuses.
- `Fire_OnDataChange` / `Fire_OnWriteComplete` / `Fire_OperationComplete` /
  `Fire_OnBufferedDataChange` (RVAs 0x15f72, 0x1611f, 0x16271, 0x163c0
  in `LmxProxy.dll`) receive ALREADY-POPULATED `MXSTATUS_PROXY[]`
  arrays — the byte-to-struct synthesis happens upstream in the
  proxy's NMX-callback ingestion code, not via a table lookup. The
  synthesis is per-event computation from operation context (engine
  ids, item handles, retry counters), not a static promotion.

R3/R4 status updated from "indefinitely deferred — no Ghidra table"
to "settled — no table exists; verbatim preservation is the canonical
answer." The .NET reference's `NmxOperationStatusMessage.TryParseInner`
+ the Rust port's `mxaccess-codec/src/operation_status.rs` already
match this canonical behaviour; no code change required.

Reopen R3/R4 only if a context-aware capture surfaces a per-byte
synthesis logic that depends on operation context — at which point
the codec would need access to the originating operation's context,
which is upstream of the bytes themselves.

**R8** marked permanently deferred — implementation already parses
NTLM AV pairs per [MS-NLMP] §2.2.2.1 (including the cross-domain
shapes `MsvAvDnsTreeName` / `MsvAvDnsComputerName` carrying the
trusted-domain DNS suffix), what's missing is the live capture, and
the live capture requires a multi-domain Windows lab not available
on this dev host. Same status pattern as F3 in `design/followups.md`.

Open evidence gaps table updated to reflect:
- Cross-domain NTLM: deferred (R8)
- Ghidra mapping table for completion-only bytes: no table exists
  (R3/R4 settled)
- Activate/Suspend transition (wire): partial (F44 + F46), live re-run
  pending (F50)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:23:05 -04:00
Joseph Doherty 0e93e3a8fa design/followups: file F48-F53 for known V1 residuals
After the M6 closeout sweep (F35-F47 all resolved), six residuals
remain that were either documented inline in design docs or implied
by closure language but not formally tracked as F-numbers. File them
explicitly so the next iteration has a clean starting list:

- F48: Execute `cargo publish` for V1 (F43 was dry-run only). Documents
  the 9-crate dependency-ordered publish sequence + the version bump
  from 0.0.0 placeholder to 0.1.0.
- F49: Live verification sweep for F36 (buffered subscribe) +
  F45 (recovery replay) + F47 (unsubscribe skip) + F40 (metrics).
  Closes the gap where these features ship with unit tests but
  weren't live-exercised against AVEVA in the closing iteration.
- F50: Run the F46 Suspend/Activate Frida capture live (script
  ready, capture deferred to maintainer-side).
- F51: Live type-matrix expansion for `asb-subscribe` — Bool /
  Float / Double / String / DateTime / Duration / arrays. F32 was
  closed via "deployable maximum" but the codec supports more types
  than the live matrix exercises.
- F52: Codec performance optimisations from F39 (BytesMut output
  buffer, name-signature cache, session scratch pool). Documented as
  post-V1 in M6-bench-baseline.md; filing them as F-numbers so the
  alloc-count deltas are tracked when they land.
- F53: Enable `#![warn(missing_docs)]` workspace-wide. Deferred from
  F42 — the lint surfaces hundreds of low-priority gaps that need a
  dedicated pass.

R3, R4, R8, F3 already in their respective tracking docs (the risks
register + the Open section's permanently-external-blocked entry).

Open section now contains: F3 (permanently external-blocked) +
F48-F53 (V1-residual triage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:11:11 -04:00
Joseph Doherty 25befcb72e design/followups: move F45 + F47 to Resolved (M6 + spawned closures)
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
F45 (commit 9b57cf8) and F47 (commit 1a1830f) close the buffered-
subscription recovery + unsubscribe symmetry gap that F36 left open.
The Open section now contains only F3 (cross-domain NTLM Type1/2/3
fixture, permanently external-blocked on this single-domain dev
host — needs multi-domain Windows lab).

This is the end-state for V1: all M0-M6 followups resolved plus the
two M6-spawned follow-ons. F3 stays Open as a documented external
gap; reopen it if the dev host gains a second domain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:59:38 -04:00
Joseph Doherty 1a1830f3bf [F47] mxaccess: unsubscribe skips UnAdvise for buffered subscriptions
Mirrors the .NET reference's `if (!subscription.IsBuffered)` guard
at `MxNativeSession.cs:361-381`. The Rust port previously emitted an
`UnAdvise` frame for both plain and buffered subscriptions; the
buffered server-side registration is unwound by the engine when the
`RegisterReference` handle goes away, so emitting an `UnAdvise` for
buffered entries is at best a no-op extra frame and at worst could
race with the engine's own teardown.

Fix: branch `Session::unsubscribe` on `SubscriptionEntry::mode` (the
discriminator F45 added). For `SubscriptionMode::Buffered { ... }`,
skip the `un_advise` call and proceed directly to registry cleanup.
For `SubscriptionMode::Plain`, retain the previous behaviour.

The registry-entry probe runs first (separate lock acquisition) so
the `is_buffered` decision doesn't hold the NMX-client mutex
unnecessarily — common case where the entry is plain still acquires
the NMX lock immediately after.

The metrics counter `record_unadvise()` still fires on every public
`unsubscribe` call regardless of mode — it tracks consumer-side
unsubscribe rate, not wire-frame rate. That matches what dashboards
expect from the public API.

New unit test `unsubscribe_skips_un_advise_for_buffered_subscription`
issues a plain subscribe (recorded as 1 RPC), mutates the registry
entry to `SubscriptionMode::Buffered`, calls unsubscribe, and
asserts the recorded RPC count stays at 1 (no UnAdvise emitted).
The existing `subscribe_populates_registry_unsubscribe_clears_it`
test serves as the negative control for the plain branch.

Workspace 794 → 795 tests; clippy clean; rustdoc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:58:57 -04:00
Joseph Doherty 9b57cf8f3b [F45] mxaccess: recovery replay re-issues RegisterReference for buffered subs
`Session::recover_connection_core` previously walked
`SessionInner::subscriptions` and replayed every entry via
`AdviseSupervisory`, which lost the `.property(buffer)` registration
on buffered subscriptions — silently downgrading buffered → plain on
transport rebuild.

Fix:

- New `pub(crate) enum SubscriptionMode { Plain, Buffered { ... } }`
  discriminator carried on each `SubscriptionEntry`. Buffered variant
  retains the un-suffixed reference + the rounded interval (so the
  re-issued buffered registration matches the original cadence) +
  the empty `item_context` / zero `item_handle` matching the wire
  send.
- `Session::subscribe` (plain path) records `SubscriptionMode::Plain`.
  `subscribe_buffered_nmx` records `SubscriptionMode::Buffered { ... }`.
- `recover_connection_core` matches on `entry.mode`. Plain branch
  unchanged. Buffered branch re-applies `.property(buffer)` via
  `to_buffered_item_definition` (idempotent), rebuilds the original
  `NmxReferenceRegistrationMessage` with the saved correlation id +
  `subscribe = true`, and dispatches `register_reference` (kind=
  ItemControl, inner command 0x10) against the replacement
  transport. Mirrors `MxNativeSession.ReAdviseSubscription`
  (`MxNativeSession.cs:538-569`).

New unit test `recover_connection_replays_buffered_subscription_via_
register_reference` synthesises a buffered registry entry, installs a
`RebuildFactory` pointing at a recording NMX server, drives
`recover_connection`, then asserts the recorded `TransferData` carries
inner command `0x10` (NOT `0x1f`) with the `.property(buffer)`-
suffixed item_definition + the saved correlation id + subscribe=true.

Side-finding worth filing separately: `Session::unsubscribe`
unconditionally calls `un_advise` for both plain and buffered
entries, but the .NET reference's `Unsubscribe`
(`MxNativeSession.cs:361-381`) skips `UnAdvise` for buffered
(`if (!subscription.IsBuffered)`). Out of scope for F45 (recovery-
only); will file as F47.

Public API unchanged. `SubscriptionMode` + `SubscriptionEntry` stay
`pub(crate)` — `cargo public-api -p mxaccess` baseline is unchanged.

Workspace 793 → 794 tests; clippy clean; rustdoc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:54:30 -04:00
Joseph Doherty 2281309a86 design/followups: move F46 to Resolved (Frida hooks landed) 2026-05-06 05:43:43 -04:00
Joseph Doherty 808fea18a0 [F46] analysis/frida: Suspend/Activate hooks + R5 next-step
Closes the wire-side gap left by capture 077 in F44's R5 walk. The Frida
script now hooks the production LmxProxy.dll dispatchers so a future live
re-run on the AVEVA host can answer "does CLMXProxyServer issue a separate
ORPC method for Suspend/Activate, or are they synthesised client-side?"

Hooks added in `analysis/frida/mx-nmx-trace.js`:
- `LmxProxy.dll!CLMXProxyServer.Suspend`  @ RVA 0x13d9c (FUN_10013d9c)
- `LmxProxy.dll!CLMXProxyServer.Activate` @ RVA 0x14028 (FUN_10014028)

Both RVAs were extracted from
`analysis/ghidra/exports/LmxProxy.dll.string-refs.tsv` rows 119/122 (the
`CLMXProxyServer::Suspend - Server Handle` / `Activate - Server Handle`
log strings each xref one function — same pattern as the existing
AdviseSupervisory hook at 0x142b4). The hooks emit `mx.suspend.begin/end`
and `mx.activate.begin/end` events with serverHandle, itemHandle, and the
`MxStatus*` out parameter decoded as 4 x int16 (Success / Category /
DetectedBy / Detail per `src/MxNativeCodec/MxStatus.cs`). Naming matches
the F46 spec's `mx.<verb>.begin / end` grep convention rather than the
generic `call.enter / leave` shape because we want to filter these out
of large traces without false positives from other LmxProxy entrypoints.

No `Resume` / `Reactivate` exports exist in `LmxProxy.dll` — verified
against `analysis/ghidra/exports/LmxProxy.dll.ghidra.md` (no such string
xrefs) and the decompiled `ILMXProxyServer5` / `ILMXProxyServer4`
interfaces under `analysis/decompiled-mxaccess/ArchestrA/MxAccess/`
(only Suspend and Activate are declared on the dispatch interface).

The script's top-of-file comment now carries the live re-run procedure
(rebuild MxTraceHarness x86, attach Frida with `--scenario=suspend-advised`
then `--scenario=activate-advised`, save under
`captures/NNN-frida-suspend-activate-instrumented/`, grep the new TSV for
`mx.suspend.*` / `mx.activate.*` and correlate with `nmx.enter` events
in the same time window). Live capture is intentionally deferred to the
maintainer per the F46 spec — this dev box has no AVEVA install.

`design/70-risks-and-open-questions.md` R5 status updated:
- Title flag `(filed as F45)` -> `(filed as F46, hook landed pending live re-run)`
  (the docs/M6-buffered-evidence.md footnote referenced F45 from before
  F45 / F46 were de-conflicted by commit 2120dfa).
- New "Next step - F46" paragraph documents the two hooked RVAs, the
  out-param decode shape, and the verified absence of Resume / Reactivate
  symbols.
- "Current best answer" paragraph re-points the residual ORPC question
  at F46.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:42:57 -04:00
Joseph Doherty c7e71e4424 design/followups: move F41 + F43 to Resolved (M6 complete)
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled
All 10 M6 sub-followups (F35-F44 minus the ones absorbed into F44)
plus F41 + F43 are now resolved. Open section narrows to:
- F45: buffered recovery replay (sub-followup of F36)
- F46: Suspend/Activate wire emission (sub-followup of F44)
- F3: cross-domain NTLM fixture (permanently external-blocked)

M6 closeout: see CHANGELOG.md for the V1 release notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:34:45 -04:00
Joseph Doherty 7b15c853d1 [F43] release prep: CHANGELOG + cargo publish --dry-run validation
V1 release prep per M6 DoD bullet 6:

**`CHANGELOG.md`** — V1 release notes covering all 9 workspace crates
(`mxaccess-codec`, `mxaccess-rpc`, `mxaccess-asb-nettcp`,
`mxaccess-asb`, `mxaccess-galaxy`, `mxaccess-callback`,
`mxaccess-nmx`, `mxaccess`, `mxaccess-compat`), the M0 → M6
milestone closeouts, deliberate divergences from the .NET reference
(multi-record DataUpdate codec relaxation per F44; buffered single-
sample stream per R2), and known limitations (F3 cross-domain NTLM,
F45 buffered recovery replay, F46 Suspend/Activate wire instrumentation,
R3/R4 OperationComplete trigger). Documents the dependency-ordered
publish sequence (leaf crates first; dependent crates require their
deps to exist on crates.io before their dry-runs can run).

**`cargo publish --dry-run` validation:**
- Leaf crates (mxaccess-codec, mxaccess-rpc, mxaccess-asb-nettcp):
  dry-run passes — tarball builds, metadata complete, license/
  description/repository/rust-version all present via
  `workspace.package`.
- Dependent crates (mxaccess-asb, mxaccess-galaxy, mxaccess-callback,
  mxaccess-nmx, mxaccess, mxaccess-compat): dry-run fails with
  "no matching package" against crates.io — expected behaviour, the
  registry lookup happens even with `--no-verify`. Validation of
  these crates falls to the build-test-clippy-public_api matrix
  rather than dry-run.

`design/followups.md`: F43 moved to Resolved with a verdict pointing
at this commit + the CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:33:43 -04:00
Joseph Doherty f0c9dd2214 rust: add version specifiers to workspace path deps for cargo publish 2026-05-06 05:31:57 -04:00
Joseph Doherty 9e57bfd451 [F41 + F44 reconciliation] cargo public-api baselines + multi-record DataUpdate codec
**F41 — public-api baselines (M6 DoD bullet 5)**

`design/public-api/{crate}.txt` for all 9 workspace crates, generated
via `cargo +nightly public-api --simplified -p <crate>`. Per-crate
baseline sizes:
- mxaccess-codec: 2516 lines
- mxaccess-asb:   1258 lines
- mxaccess-rpc:   1273 lines
- mxaccess-asb-nettcp: 708 lines
- mxaccess: 542 lines
- mxaccess-galaxy: 374 lines
- mxaccess-callback: 170 lines
- mxaccess-compat: 123 lines
- mxaccess-nmx: 118 lines

`design/public-api/README.md` documents the update procedure
(install nightly + cargo-public-api, regenerate the affected baseline
on intentional API changes, commit alongside).

`.github/workflows/rust.yml` gains a `public-api` job that runs the
same diff against the committed baseline; drift fails CI with a
unified diff in the log so the PR author can either revert or
update the baseline.

**F44 reconciliation — multi-record DataUpdate codec**

Cherry-picked from the F44 sub-agent's worktree (commit `aec6a0c`):
`subscription_message.rs::parse_data_update` now loops over
`record_count` like `parse_subscription_status` does, accepting any
positive count. The .NET reference still hard-throws on
`record_count != 1`; the Rust codec deliberately diverges per the F44
evidence walk against `captures/094-frida-buffered-separate-writer/
frida-events.tsv:145` (a `0x33` DataUpdate body with `record_count = 2`,
inner_length = 23 (preamble) + 2 * 19 (records) = 61, post a
separate-session writer triggering two value changes inside one
`SetBufferedUpdateInterval(1000)` window).

Two new round-trip tests:
- `data_update_multi_record_round_trip` — synthesises a 2-record body,
  parses, asserts both records decode to expected Int32 values.
- `data_update_capture_094_truncated_record_errors` — truncates the
  capture-094 fixture mid-second-record, asserts CodecError::Decode.

New wire-byte fixtures under `crates/mxaccess-codec/tests/fixtures/m6-buffered/`:
- `094-line145-dataupdate-recordcount2.bin` (57 bytes, `0x33` multi-record)
- `094-line48-substatus-recordcount2.bin` (101 bytes, `0x32` multi-record)

R2 in `design/70-risks-and-open-questions.md` updated from
"single-sample (settled silently)" to "settled per option (a) — codec
relaxed; multi-record observed in production-stack tracing."

`design/followups.md`: F44's verdict updated to reflect the
contradiction-then-relaxation, with reference to the new tests +
fixtures.

Workspace 792 → 794 tests pass; clippy clean; rustdoc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:27:11 -04:00
Joseph Doherty 2120dfa965 design/followups: move F35/F40/F44 to Resolved + de-conflict F45/F46
rust / build / test / clippy / fmt (push) Has been cancelled
After commits d5aa152 (F35) and ad1cf23 (F36+F40+F44), three M6
sub-followups belong under Resolved with concise verdicts referencing
the matching commits.

Sub-agent merge cleanup:
- Two sub-agents independently filed new followup F45 in parallel —
  rename the Suspend/Activate wire-emission gap to F46, leaving the
  buffered-recovery-replay item as F45 (filed by the F36 work since
  it's the more immediate dependent).
- Open section now contains only F41 + F43 + F45 + F46 + F3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:15:13 -04:00
Joseph Doherty ad1cf2351c [F36 + F40 + F44] M6 wave 1: subscribe_buffered (NMX) + metrics + evidence
Three M6 sub-followups landed in this wave (sub-agent worktrees +
manual reconciliation in main):

**F36 — Session::subscribe_buffered (NMX) per R2 single-sample**
- `BufferedOptions::rounded_update_interval_ms()` — 100ms rounding
  helper mirroring MxNativeCompatibilityServer.cs:638
  ((updateInterval + 99) / 100) * 100, saturating on overflow.
- `Session::subscribe_buffered` (public, lib.rs:604) delegates to
  the new private `subscribe_buffered_nmx` which uses the buffered
  RegisterReference path: item_definition suffixed with
  `.property(buffer)`, subscribe=true (no separate
  AdviseSupervisory follow-up — verified against capture 082).
- Per R2 verified at wwtools/mxaccesscli/docs/api-notes.md the wire
  semantic is single-sample-per-event with a server-side cadence
  knob; rounded_ms is held client-side only (native MXAccess does
  not emit a separate SetBufferedUpdateInterval RPC, verified by
  absence in 079/082 captures).
- New crates/mxaccess/examples/subscribe-buffered.rs.
- New crates/mxaccess-codec/tests/buffered_register_reference_parity.rs:
  4 tests (capture 079/082 round-trip, suffix helper, constructive
  forward-build vs capture 082).

**F40 — Optional metrics feature**
- New crates/mxaccess/src/metrics.rs (275 lines): `pub(crate)`
  thin wrappers (`record_write_latency`, `record_read_latency`,
  `inc_writes`, `inc_reads`, `inc_advises`, `inc_recovery_*`,
  `set_active_subscriptions`, etc.) that compile to no-ops under
  `#[cfg(not(feature = "metrics"))]`. Call sites in session.rs +
  asb_session.rs invoke them unconditionally; the gate is inside
  the wrapper.
- `metrics = { version = "0.24", optional = true }` added to
  workspace + mxaccess crate Cargo.toml.
- Default build: zero metrics dep, zero runtime cost.

**F44 — Buffered batch + suspend capture decode evidence**
- New docs/M6-buffered-evidence.md: per-capture summary for
  077, 079, 080, 081, 082, 094 — call sequence, key wire bytes,
  R2/R5 verdict.
- R2 confirmed silently as "not a real risk" — single-sample
  observed across 079/080/082/094.
- R5 trigger conditions documented from capture 077: AdviseSupervisory
  + Suspend pair, 1-second intervals, succeeds on enum attributes.
- design/70-risks-and-open-questions.md R2/R5 status updated.

Workspace: 759 → 792 tests, clippy clean, rustdoc -D warnings clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:12:17 -04:00
Joseph Doherty d5aa152b1f [F35] mxaccess-compat: LMXProxyServer-shaped facade (18 methods)
Replace the 8-line `mxaccess-compat` stub with a real `LmxClient`
struct exposing the 18 `ILMXProxyServer5` methods as Rust async fns
on top of `mxaccess::Session` (NMX) and `mxaccess::AsbSession` (ASB).

Handle-table approach
* `Mutex<HashMap<i32, ItemRef>>` for item handles, populated by
  `add_item` / `add_item_2` / `add_buffered_item`, drained by
  `remove_item` / `unregister`.
* `Mutex<HashMap<i32, UserRef>>` for user handles allocated by
  `authenticate_user` / `archestra_user_to_id`.
* `AtomicI32` monotonic counters for both, matching the .NET
  reference's `_nextItemHandle` / `_nextUserHandles` per
  `MxNativeCompatibilityServer.cs:62-63`.

Stream-based event surface (per Q4)
* `OnDataChange` / `OnBufferedDataChange` / `OnWriteComplete` /
  `OperationComplete` exposed as `EventStream<T>: Stream<Item=T>`,
  backed by `tokio::sync::broadcast` channels. Lag silently skips
  past `BroadcastStream::Lagged` to keep the public `Item` shape
  ergonomic. NOT COM events — that's the post-V1
  `mxaccess-compat-com` crate per design/70-risks-and-open-questions.md
  Q4. The `OperationComplete` channel is wired but no firing path
  is modelled (R3 deferred — no captured byte mapping yet).
* `Advise` / `AdviseSupervisory` spawn a background fan-out task
  that drains the `Subscription` stream and routes each
  `DataChange` to either `on_data_change` or
  `on_buffered_data_change` based on the item's `is_buffered` flag.
  `UnAdvise` / `RemoveItem` abort the task.

Pass-through methods
* `Write` / `Write2` -> `Session::write` / `write_with_timestamp`
  (`userId` ignored — the underlying surface uses engine identity).
* `WriteSecured2` -> `Session::write_secured_at` with both user ids
  always passed (R6: single-user secured = same id twice; never
  gated).
* `AdviseSupervisory` collapses onto `Session::subscribe` because
  the wire path is `AdviseSupervisory` already (`session.rs:1057`),
  matching the .NET reference's `cs:251-259` identical collapse.
* `SetBufferedUpdateInterval` rounds up to nearest 100 ms per
  `MxNativeCompatibilityServer.cs:638`.

Stubbed pass-throughs (mirror upstream `Error::Unsupported`)
* `WriteSecured` (no timestamp) — `Session::write_secured` is
  stubbed at `crates/mxaccess/src/lib.rs:472` (only
  `WriteSecured2`/`0x3A` is ported); workaround documented inline.
* `AddBufferedItem` allocates the handle but `Advise` for buffered
  items does not yet drive `Session::subscribe_buffered` cadence
  knob — TODO(F36) flagged inline at `add_buffered_item` and
  `set_buffered_update_interval`.

Tests (25 new, all green)
* Handle-table lifecycle: Add -> Advise -> UnAdvise -> Remove with
  a mocked subscription task.
* Monotonic handle allocation; context-prefix combination.
* `SetBufferedUpdateInterval` rounding (50 -> 100, 101 -> 200, etc.)
  + zero-rejection.
* Compile-time check that all 18 LMX methods are reachable on
  `LmxClient`.
* Each event stream yields published items; lag silently dropped.
* GUID-shape validation; server-handle mismatch errors.

Build hygiene
* `cargo build -p mxaccess-compat` clean.
* `cargo test -p mxaccess-compat` -> 25 passed.
* `cargo clippy -p mxaccess-compat --all-targets -- -D warnings` clean.
* `RUSTDOCFLAGS=-D warnings cargo doc -p mxaccess-compat --no-deps` clean.

Deferred / TODOs
* TODO(F36): wire `set_buffered_update_interval` cadence into the
  `advise` path for buffered items.
* TODO(R3): plumb a real trigger into `on_operation_complete` once
  the byte mapping lands.
* TODO(wave 2): live integration tests against AVEVA.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:06:26 -04:00
Joseph Doherty a1c4c6203e design/followups: move F37/F38/F39/F42 to Resolved
rust / build / test / clippy / fmt (push) Has been cancelled
Four M6 sub-followups closed in this session — moved to Resolved
section with concise verdicts referencing the matching commits:

- F37 (commit 34045c2): ASB subscribe_buffered returns Unsupported
- F38 (commit 71c69b8): counting-allocator bench harness + R12
  baseline showing the target is already met
- F39 (closed-via-F38): zero-copy pass not needed for R12 target
  (1-4 allocs/op across the proven matrix); remaining
  optimisations documented as post-V1 work
- F42 (commit e79e289): cargo doc --workspace --no-deps clean

Open M6 work remaining: F35 (compat facade), F36 (NMX
subscribe_buffered), F40 (metrics feature), F41 (public-api
baseline), F43 (release prep), F44 (capture decode evidence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:47:38 -04:00
Joseph Doherty 71c69b80c6 [F38] mxaccess-codec: counting-allocator bench harness + R12 baseline
Hand-rolled GlobalAlloc wrapper around System that tracks allocs +
bytes + deallocs via two atomics. Each scenario runs 10k iterations
after a 1k warm-up; output is a markdown table with allocs/op,
bytes/op, deallocs/op.

Why hand-rolled (not dhat/criterion): R12 gates on a single number
("< 5 allocs/write"). dhat is heap-profiling-oriented (call-stack
attribution, JSON snapshots); criterion measures wall-clock latency
which is reported-but-not-gated per 60-roadmap.md:104. A 50-line
GlobalAlloc + atomic counters is the simplest thing that answers
the gate.

Run: `cargo bench -p mxaccess-codec`

Baseline numbers (release, Windows x64):
- Bool write:    1.00 allocs/op
- Int32 write:   2.00 allocs/op
- Float32 write: 2.00 allocs/op
- Float64 write: 2.00 allocs/op
- String write:  4.00 allocs/op (5-char string)
- Handle from_names: 2.00 allocs/op
- DataUpdate decode: 1.00 alloc/op

R12's < 5 allocs/write target is **already met** across the proven
matrix without any zero-copy work. The bench gates on this — any
write_message::encode scenario at >= 5 allocs/op exits the harness
with code 1.

Companion: `design/M6-bench-baseline.md` documents the numbers,
explains the per-scenario breakdown, and tightens F39's scope from
"hit the target" to "nice-to-have optimisations" (BytesMut output
buffer, name-signature cache, session-level scratch pool).

Workspace: 759 tests still pass; clippy --benches clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:45:33 -04:00
Joseph Doherty e79e289743 [F42] cargo doc --workspace --no-deps clean (0 warnings)
Fix all 33 rustdoc warnings across the workspace:

- Unresolved intra-doc links: rewrite [`name`] → either backtick text
  (when not actually a link) or fully-qualified `[Type::method]` /
  `[crate::module::name]` form. Affected: mxaccess-codec
  (asb_variant, item_control, metadata_query, observed_write_template,
  reference_handle, write_message), mxaccess-rpc (pdu), mxaccess-nmx
  (client), mxaccess-asb-nettcp (nmf), mxaccess-callback (exporter),
  mxaccess (asb_session, session, lib).
- Bracket-text being interpreted as link refs (e.g. `body[17]` →
  `` `body[17]` ``).
- Private-item references in public docs (CALLBACK_BROADCAST_CAPACITY,
  recover_connection_core, mxvalue_to_writevalue) reduced to
  backtick-text since they aren't part of the public API.

`RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps` now
exits clean. Workspace 759 tests pass; clippy clean.

Defers `#![warn(missing_docs)]` lint to a future pass — the cleanup
target is the broken-link warnings, which are signal; missing-docs
would surface hundreds of low-priority public-item gaps that are out
of scope for this F-number.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:39:51 -04:00
Joseph Doherty 34045c2f6d [F37] mxaccess: AsbSession::subscribe_buffered returns Unsupported
ASB has no `SetBufferedUpdateInterval` analogue — the per-monitored-
item `MinimalMonitoredItem::sample_interval` plays the cadence-knob
role. Calling `subscribe_buffered` on an ASB session now returns
`Error::Unsupported { transport: TransportKind::Asb, operation: ... }`
synchronously, without touching the wire.

The error-construction logic is split into a free fn
`unsupported_subscribe_buffered_error()` so the gate's exact shape
is unit-testable without spinning up a live authenticator + transport
fake. New unit test asserts both the variant tag and that the
operation message names the unsupported method + hints at the
`sample_interval` analogue.

Workspace 758 → 759 tests, clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:32:45 -04:00
Joseph Doherty 2546710604 design/followups: add F35-F44 for M6 implementation plan
10 new followups decompose M6 (compatibility shim + production
hardening) into parallel-safe sub-streams:

- F35: mxaccess-compat LMXProxyServer-shaped facade (18 methods over
  Session/AsbSession)
- F36: Session::subscribe_buffered NMX path per R2 single-sample
- F37: ASB subscribe_buffered capability gate
- F38: counting-allocator cargo bench harness for R12 target
- F39: zero-copy codec pass (depends on F38)
- F40: optional metrics feature
- F41: cargo public-api baseline (depends on F35/F36/F37/F39/F40)
- F42: cargo doc cleanup pass
- F43: cargo publish --dry-run all crates (depends on F41)
- F44: decode buffered batch + suspend captures (077, 079-082, 094)
  for R2/R5 evidence

Parallelization: Wave 1 = F35/F36/F37/F38/F40/F42/F44 (different
crates / different concerns); Wave 2 = F39 (needs F38's bench);
Wave 3 = F41 (needs API stable); Wave 4 = F43 (release).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:28:38 -04:00