Commit Graph

102 Commits

Author SHA1 Message Date
Joseph Doherty 9e57bfd451 [F41 + F44 reconciliation] cargo public-api baselines + multi-record DataUpdate codec
**F41 — public-api baselines (M6 DoD bullet 5)**

`design/public-api/{crate}.txt` for all 9 workspace crates, generated
via `cargo +nightly public-api --simplified -p <crate>`. Per-crate
baseline sizes:
- mxaccess-codec: 2516 lines
- mxaccess-asb:   1258 lines
- mxaccess-rpc:   1273 lines
- mxaccess-asb-nettcp: 708 lines
- mxaccess: 542 lines
- mxaccess-galaxy: 374 lines
- mxaccess-callback: 170 lines
- mxaccess-compat: 123 lines
- mxaccess-nmx: 118 lines

`design/public-api/README.md` documents the update procedure
(install nightly + cargo-public-api, regenerate the affected baseline
on intentional API changes, commit alongside).

`.github/workflows/rust.yml` gains a `public-api` job that runs the
same diff against the committed baseline; drift fails CI with a
unified diff in the log so the PR author can either revert or
update the baseline.

**F44 reconciliation — multi-record DataUpdate codec**

Cherry-picked from the F44 sub-agent's worktree (commit `aec6a0c`):
`subscription_message.rs::parse_data_update` now loops over
`record_count` like `parse_subscription_status` does, accepting any
positive count. The .NET reference still hard-throws on
`record_count != 1`; the Rust codec deliberately diverges per the F44
evidence walk against `captures/094-frida-buffered-separate-writer/
frida-events.tsv:145` (a `0x33` DataUpdate body with `record_count = 2`,
inner_length = 23 (preamble) + 2 * 19 (records) = 61, post a
separate-session writer triggering two value changes inside one
`SetBufferedUpdateInterval(1000)` window).

Two new round-trip tests:
- `data_update_multi_record_round_trip` — synthesises a 2-record body,
  parses, asserts both records decode to expected Int32 values.
- `data_update_capture_094_truncated_record_errors` — truncates the
  capture-094 fixture mid-second-record, asserts CodecError::Decode.

New wire-byte fixtures under `crates/mxaccess-codec/tests/fixtures/m6-buffered/`:
- `094-line145-dataupdate-recordcount2.bin` (57 bytes, `0x33` multi-record)
- `094-line48-substatus-recordcount2.bin` (101 bytes, `0x32` multi-record)

R2 in `design/70-risks-and-open-questions.md` updated from
"single-sample (settled silently)" to "settled per option (a) — codec
relaxed; multi-record observed in production-stack tracing."

`design/followups.md`: F44's verdict updated to reflect the
contradiction-then-relaxation, with reference to the new tests +
fixtures.

Workspace 792 → 794 tests pass; clippy clean; rustdoc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:27:11 -04:00
Joseph Doherty 2120dfa965 design/followups: move F35/F40/F44 to Resolved + de-conflict F45/F46
rust / build / test / clippy / fmt (push) Has been cancelled
After commits d5aa152 (F35) and ad1cf23 (F36+F40+F44), three M6
sub-followups belong under Resolved with concise verdicts referencing
the matching commits.

Sub-agent merge cleanup:
- Two sub-agents independently filed new followup F45 in parallel —
  rename the Suspend/Activate wire-emission gap to F46, leaving the
  buffered-recovery-replay item as F45 (filed by the F36 work since
  it's the more immediate dependent).
- Open section now contains only F41 + F43 + F45 + F46 + F3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:15:13 -04:00
Joseph Doherty ad1cf2351c [F36 + F40 + F44] M6 wave 1: subscribe_buffered (NMX) + metrics + evidence
Three M6 sub-followups landed in this wave (sub-agent worktrees +
manual reconciliation in main):

**F36 — Session::subscribe_buffered (NMX) per R2 single-sample**
- `BufferedOptions::rounded_update_interval_ms()` — 100ms rounding
  helper mirroring MxNativeCompatibilityServer.cs:638
  ((updateInterval + 99) / 100) * 100, saturating on overflow.
- `Session::subscribe_buffered` (public, lib.rs:604) delegates to
  the new private `subscribe_buffered_nmx` which uses the buffered
  RegisterReference path: item_definition suffixed with
  `.property(buffer)`, subscribe=true (no separate
  AdviseSupervisory follow-up — verified against capture 082).
- Per R2 verified at wwtools/mxaccesscli/docs/api-notes.md the wire
  semantic is single-sample-per-event with a server-side cadence
  knob; rounded_ms is held client-side only (native MXAccess does
  not emit a separate SetBufferedUpdateInterval RPC, verified by
  absence in 079/082 captures).
- New crates/mxaccess/examples/subscribe-buffered.rs.
- New crates/mxaccess-codec/tests/buffered_register_reference_parity.rs:
  4 tests (capture 079/082 round-trip, suffix helper, constructive
  forward-build vs capture 082).

**F40 — Optional metrics feature**
- New crates/mxaccess/src/metrics.rs (275 lines): `pub(crate)`
  thin wrappers (`record_write_latency`, `record_read_latency`,
  `inc_writes`, `inc_reads`, `inc_advises`, `inc_recovery_*`,
  `set_active_subscriptions`, etc.) that compile to no-ops under
  `#[cfg(not(feature = "metrics"))]`. Call sites in session.rs +
  asb_session.rs invoke them unconditionally; the gate is inside
  the wrapper.
- `metrics = { version = "0.24", optional = true }` added to
  workspace + mxaccess crate Cargo.toml.
- Default build: zero metrics dep, zero runtime cost.

**F44 — Buffered batch + suspend capture decode evidence**
- New docs/M6-buffered-evidence.md: per-capture summary for
  077, 079, 080, 081, 082, 094 — call sequence, key wire bytes,
  R2/R5 verdict.
- R2 confirmed silently as "not a real risk" — single-sample
  observed across 079/080/082/094.
- R5 trigger conditions documented from capture 077: AdviseSupervisory
  + Suspend pair, 1-second intervals, succeeds on enum attributes.
- design/70-risks-and-open-questions.md R2/R5 status updated.

Workspace: 759 → 792 tests, clippy clean, rustdoc -D warnings clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:12:17 -04:00
Joseph Doherty d5aa152b1f [F35] mxaccess-compat: LMXProxyServer-shaped facade (18 methods)
Replace the 8-line `mxaccess-compat` stub with a real `LmxClient`
struct exposing the 18 `ILMXProxyServer5` methods as Rust async fns
on top of `mxaccess::Session` (NMX) and `mxaccess::AsbSession` (ASB).

Handle-table approach
* `Mutex<HashMap<i32, ItemRef>>` for item handles, populated by
  `add_item` / `add_item_2` / `add_buffered_item`, drained by
  `remove_item` / `unregister`.
* `Mutex<HashMap<i32, UserRef>>` for user handles allocated by
  `authenticate_user` / `archestra_user_to_id`.
* `AtomicI32` monotonic counters for both, matching the .NET
  reference's `_nextItemHandle` / `_nextUserHandles` per
  `MxNativeCompatibilityServer.cs:62-63`.

Stream-based event surface (per Q4)
* `OnDataChange` / `OnBufferedDataChange` / `OnWriteComplete` /
  `OperationComplete` exposed as `EventStream<T>: Stream<Item=T>`,
  backed by `tokio::sync::broadcast` channels. Lag silently skips
  past `BroadcastStream::Lagged` to keep the public `Item` shape
  ergonomic. NOT COM events — that's the post-V1
  `mxaccess-compat-com` crate per design/70-risks-and-open-questions.md
  Q4. The `OperationComplete` channel is wired but no firing path
  is modelled (R3 deferred — no captured byte mapping yet).
* `Advise` / `AdviseSupervisory` spawn a background fan-out task
  that drains the `Subscription` stream and routes each
  `DataChange` to either `on_data_change` or
  `on_buffered_data_change` based on the item's `is_buffered` flag.
  `UnAdvise` / `RemoveItem` abort the task.

Pass-through methods
* `Write` / `Write2` -> `Session::write` / `write_with_timestamp`
  (`userId` ignored — the underlying surface uses engine identity).
* `WriteSecured2` -> `Session::write_secured_at` with both user ids
  always passed (R6: single-user secured = same id twice; never
  gated).
* `AdviseSupervisory` collapses onto `Session::subscribe` because
  the wire path is `AdviseSupervisory` already (`session.rs:1057`),
  matching the .NET reference's `cs:251-259` identical collapse.
* `SetBufferedUpdateInterval` rounds up to nearest 100 ms per
  `MxNativeCompatibilityServer.cs:638`.

Stubbed pass-throughs (mirror upstream `Error::Unsupported`)
* `WriteSecured` (no timestamp) — `Session::write_secured` is
  stubbed at `crates/mxaccess/src/lib.rs:472` (only
  `WriteSecured2`/`0x3A` is ported); workaround documented inline.
* `AddBufferedItem` allocates the handle but `Advise` for buffered
  items does not yet drive `Session::subscribe_buffered` cadence
  knob — TODO(F36) flagged inline at `add_buffered_item` and
  `set_buffered_update_interval`.

Tests (25 new, all green)
* Handle-table lifecycle: Add -> Advise -> UnAdvise -> Remove with
  a mocked subscription task.
* Monotonic handle allocation; context-prefix combination.
* `SetBufferedUpdateInterval` rounding (50 -> 100, 101 -> 200, etc.)
  + zero-rejection.
* Compile-time check that all 18 LMX methods are reachable on
  `LmxClient`.
* Each event stream yields published items; lag silently dropped.
* GUID-shape validation; server-handle mismatch errors.

Build hygiene
* `cargo build -p mxaccess-compat` clean.
* `cargo test -p mxaccess-compat` -> 25 passed.
* `cargo clippy -p mxaccess-compat --all-targets -- -D warnings` clean.
* `RUSTDOCFLAGS=-D warnings cargo doc -p mxaccess-compat --no-deps` clean.

Deferred / TODOs
* TODO(F36): wire `set_buffered_update_interval` cadence into the
  `advise` path for buffered items.
* TODO(R3): plumb a real trigger into `on_operation_complete` once
  the byte mapping lands.
* TODO(wave 2): live integration tests against AVEVA.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 05:06:26 -04:00
Joseph Doherty a1c4c6203e design/followups: move F37/F38/F39/F42 to Resolved
rust / build / test / clippy / fmt (push) Has been cancelled
Four M6 sub-followups closed in this session — moved to Resolved
section with concise verdicts referencing the matching commits:

- F37 (commit 34045c2): ASB subscribe_buffered returns Unsupported
- F38 (commit 71c69b8): counting-allocator bench harness + R12
  baseline showing the target is already met
- F39 (closed-via-F38): zero-copy pass not needed for R12 target
  (1-4 allocs/op across the proven matrix); remaining
  optimisations documented as post-V1 work
- F42 (commit e79e289): cargo doc --workspace --no-deps clean

Open M6 work remaining: F35 (compat facade), F36 (NMX
subscribe_buffered), F40 (metrics feature), F41 (public-api
baseline), F43 (release prep), F44 (capture decode evidence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:47:38 -04:00
Joseph Doherty 71c69b80c6 [F38] mxaccess-codec: counting-allocator bench harness + R12 baseline
Hand-rolled GlobalAlloc wrapper around System that tracks allocs +
bytes + deallocs via two atomics. Each scenario runs 10k iterations
after a 1k warm-up; output is a markdown table with allocs/op,
bytes/op, deallocs/op.

Why hand-rolled (not dhat/criterion): R12 gates on a single number
("< 5 allocs/write"). dhat is heap-profiling-oriented (call-stack
attribution, JSON snapshots); criterion measures wall-clock latency
which is reported-but-not-gated per 60-roadmap.md:104. A 50-line
GlobalAlloc + atomic counters is the simplest thing that answers
the gate.

Run: `cargo bench -p mxaccess-codec`

Baseline numbers (release, Windows x64):
- Bool write:    1.00 allocs/op
- Int32 write:   2.00 allocs/op
- Float32 write: 2.00 allocs/op
- Float64 write: 2.00 allocs/op
- String write:  4.00 allocs/op (5-char string)
- Handle from_names: 2.00 allocs/op
- DataUpdate decode: 1.00 alloc/op

R12's < 5 allocs/write target is **already met** across the proven
matrix without any zero-copy work. The bench gates on this — any
write_message::encode scenario at >= 5 allocs/op exits the harness
with code 1.

Companion: `design/M6-bench-baseline.md` documents the numbers,
explains the per-scenario breakdown, and tightens F39's scope from
"hit the target" to "nice-to-have optimisations" (BytesMut output
buffer, name-signature cache, session-level scratch pool).

Workspace: 759 tests still pass; clippy --benches clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:45:33 -04:00
Joseph Doherty e79e289743 [F42] cargo doc --workspace --no-deps clean (0 warnings)
Fix all 33 rustdoc warnings across the workspace:

- Unresolved intra-doc links: rewrite [`name`] → either backtick text
  (when not actually a link) or fully-qualified `[Type::method]` /
  `[crate::module::name]` form. Affected: mxaccess-codec
  (asb_variant, item_control, metadata_query, observed_write_template,
  reference_handle, write_message), mxaccess-rpc (pdu), mxaccess-nmx
  (client), mxaccess-asb-nettcp (nmf), mxaccess-callback (exporter),
  mxaccess (asb_session, session, lib).
- Bracket-text being interpreted as link refs (e.g. `body[17]` →
  `` `body[17]` ``).
- Private-item references in public docs (CALLBACK_BROADCAST_CAPACITY,
  recover_connection_core, mxvalue_to_writevalue) reduced to
  backtick-text since they aren't part of the public API.

`RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps` now
exits clean. Workspace 759 tests pass; clippy clean.

Defers `#![warn(missing_docs)]` lint to a future pass — the cleanup
target is the broken-link warnings, which are signal; missing-docs
would surface hundreds of low-priority public-item gaps that are out
of scope for this F-number.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:39:51 -04:00
Joseph Doherty 34045c2f6d [F37] mxaccess: AsbSession::subscribe_buffered returns Unsupported
ASB has no `SetBufferedUpdateInterval` analogue — the per-monitored-
item `MinimalMonitoredItem::sample_interval` plays the cadence-knob
role. Calling `subscribe_buffered` on an ASB session now returns
`Error::Unsupported { transport: TransportKind::Asb, operation: ... }`
synchronously, without touching the wire.

The error-construction logic is split into a free fn
`unsupported_subscribe_buffered_error()` so the gate's exact shape
is unit-testable without spinning up a live authenticator + transport
fake. New unit test asserts both the variant tag and that the
operation message names the unsupported method + hints at the
`sample_interval` analogue.

Workspace 758 → 759 tests, clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:32:45 -04:00
Joseph Doherty 2546710604 design/followups: add F35-F44 for M6 implementation plan
10 new followups decompose M6 (compatibility shim + production
hardening) into parallel-safe sub-streams:

- F35: mxaccess-compat LMXProxyServer-shaped facade (18 methods over
  Session/AsbSession)
- F36: Session::subscribe_buffered NMX path per R2 single-sample
- F37: ASB subscribe_buffered capability gate
- F38: counting-allocator cargo bench harness for R12 target
- F39: zero-copy codec pass (depends on F38)
- F40: optional metrics feature
- F41: cargo public-api baseline (depends on F35/F36/F37/F39/F40)
- F42: cargo doc cleanup pass
- F43: cargo publish --dry-run all crates (depends on F41)
- F44: decode buffered batch + suspend captures (077, 079-082, 094)
  for R2/R5 evidence

Parallelization: Wave 1 = F35/F36/F37/F38/F40/F42/F44 (different
crates / different concerns); Wave 2 = F39 (needs F38's bench);
Wave 3 = F41 (needs API stable); Wave 4 = F43 (release).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:28:38 -04:00
Joseph Doherty bedad57b4e design/followups: move F18 (M5 meta-tracker) to Resolved
rust / build / test / clippy / fmt (push) Has been cancelled
Trim the planning content (sub-stream breakdown table, parallel-safety
analysis, risk-driven sequencing, "Resolves when" gate) since M5 is
done. Keep the closure verdict, M5 DoD checklist showing the actual
state at close, sub-followup closeout list (F19-F26 + F28/F29/F30/
F31/F32/F33/F34), cumulative execution log, and the architectural
note explaining why AsbSession stays parallel to the NMX Session
rather than unified — that's load-bearing for future maintenance.

Open section now contains only F3 (cross-domain NTLM Type1/2/3
fixture, permanently external-blocked on this single-domain dev host
— resolution requires multi-domain Windows lab not available here).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:09:03 -04:00
Joseph Doherty b1a5f5ff1e design/followups: move F34 to Resolved (live-verified closure)
The F34 entry's body had grown into a debugging notebook with five
"Open hypotheses" and a "Resolves when" speculation block — all of
which are now moot since the actual fix landed. Trim to the closure
verdict, the technical evidence (captured fixture dictionary, the
dual-format insight), and the bonus context discovered while
debugging (Active/SampleInterval/result_code 32 quirks). Move from
"## Open" to "## Resolved" with date + commit 101a8b1.

Open section now contains only F18 (M5 meta-tracker, resolved at the
top) and F3 (cross-domain NTLM fixture, permanently external-blocked
on this single-domain dev host).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:04:37 -04:00
Joseph Doherty 101a8b13f5 [F34] mxaccess-asb: AddMonitoredItems body uses DataContract field names
Rewrite push_monitored_item_body to emit the DataContract field-suffix
names from AsbContracts.cs:940-965 (activeField, bufferedField,
itemField, sampleIntervalField, timeDeadbandField, userDataField,
valueDeadbandField) under prefix `b` bound to the
http://schemas.datacontract.org/2004/07/ArchestrAServices.ASBIDataV2Contract
namespace. The <Items> wrapper now declares xmlns:b + xmlns:i.

The legacy XmlSerializer property names (<Active>, <Item>,
<SampleInterval>, <Buffered>) only matter for the canonical-XML HMAC
signing input — that emitter at xml_canonical::emit_monitored_item is
unchanged and F28 fixture byte-equality still holds for all 13 ops.
On the binary NBFX wire MxDataProvider's DataContractSerializer
expects the field-suffix form.

Wire-byte type encoding matches the captured fixture
(add-monitored-items-request-wire.bin): bool → Bool record, ulong →
Zero/One/Chars (XmlConvert decimal text), ushort → Zero/One/Int8/Int16/Int32
(smallest-fit binary). Empty string? + null byte[]? emit as empty
elements with no <i:nil> attribute (matching the wire). Field order
follows the explicit [DataMember(Order = N)] sequence.

Adjacent: ItemIdentity is nested via DataContract field names too —
NOT the binary <ASBIData> fast-path, which only kicks in at top-level
message body members.

Verified live against AVEVA MxDataProvider: AddMonitoredItems now
returns 1 status item with error_code=0x0000 (previously 0 items;
the silent failure was the deliberate DC-schema mismatch); Publish
poll #4 delivers the actual tag value as
AsbVariant { type_id: 4, length: 4, payload: [99,0,0,0] } through the
F26 stream.

Pre-existing clippy::format_collect errors in auth.rs:339,342 and
client.rs:952 fixed in passing — they were blocking workspace clippy
otherwise.

Workspace: 757 → 758 tests, clippy -D warnings clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:01:11 -04:00
Joseph Doherty 6762526f09 design/followups: mark F18 (M5 meta-tracker) resolved
All sub-followups F19-F26 closed; M5 is functionally LIVE end-to-end
(asb-subscribe returns the real tag value over the wire). The
structural-port followups F18 spawned (F2/F10/F11/F27 for NTLM /
DCOM / RemUnknown / DH) all resolved separately. F18 stays under
"## Open" as the cumulative-execution-log anchor; status line at the
top now reflects the closed state so the open/resolved structure
matches reality.

Remaining open items: F34 (MonitoredItem wire format, P2 — needs
nbfx auto-intern fix + DataContract field-suffix body builders) and
F3 (cross-domain NTLM fixture, P2 — permanently external-blocked
on this single-domain dev host).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 03:35:41 -04:00
Joseph Doherty 1de049e114 [F2] mxaccess-rpc: NTLM verify_signature (server-to-client) with constant-time MAC compare
rust / build / test / clippy / fmt (push) Has been cancelled
Closes F2. Structural port from [MS-NLMP] §3.4.4 — same shape as
the existing sign path but uses the server-to-client sub-keys
(`SealKey_S→C` / `SignKey_S→C`) derived alongside the client-to-
server pair at the end of create_type3.

NtlmClientContext gained four new fields populated during
create_type3:
  - server_signing_key
  - server_sealing_key
  - server_sealing_state (independent RC4 stream)
  - server_sequence (independent counter)

The S→C key derivation already existed in auth.rs (the seal_key /
sign_key helpers take a client_mode flag); F2 plumbs them into a
new verify_signature(message, signature) method.

The verify path:
  1. Validates signature.len() == 16 + leading version word 0x01.
  2. Reads trailing seq num, compares against self.server_sequence
     (mismatch ⇒ InvalidSignature, no state change).
  3. Computes expected_mac = HMAC_MD5(server_signing_key,
     seq || message)[0..8] then RC4 transform.
  4. Constant-time compares expected_mac against wire bytes 4..12
     via subtle::ConstantTimeEq.
  5. On success: commits cipher-state advance + ++server_sequence.
     On failure: re-derives RC4 from server_sealing_key and skips
     past server_sequence × 8 keystream bytes to restore the
     pre-verify position — caller can retry.

New dep `subtle = "2"` (workspace-internal to mxaccess-rpc) for
the timing-oracle-safe MAC compare.

6 new tests:
  - verify_signature_round_trip_against_sign (3-message sequence
    via paired_authed_context helper that aliases server-side keys
    onto client-side for self-validating round-trip)
  - verify_signature_rejects_corrupted_mac (with
    server_sequence-non-advance assertion)
  - verify_signature_rejects_wrong_sequence_number
  - verify_signature_rejects_wrong_version_field
  - verify_signature_rejects_wrong_length
  - verify_signature_before_authenticate_errors

mxaccess-rpc 188 → 194 tests; default-feature clippy clean.

The "awaiting wire-fixture capture" step listed in F2's prior
status note is no longer a hard prerequisite — [MS-NLMP] §3.4.4
fully defines the algorithm and the round-trip tests prove the
encoder/decoder pair is internally consistent. A captured
StatusReceived frame would still validate byte-parity vs a real
NmxSvc.exe signer, but that's future verification work; the
structural port ships unblocked.

design/followups.md F2 moved to Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 03:30:48 -04:00
Joseph Doherty 161b0cedfa [F10 + F11] mxaccess-rpc: structural ports for ResolveOxid2 + RemAddRef/RemRelease
rust / build / test / clippy / fmt (push) Has been cancelled
Closes F10 and F11 per option (b) of each followup's resolve
criterion: hand-rolled body codecs derived from the [MS-DCOM]
spec, ship structurally with no live fixture (the .NET reference
doesn't call these opnums), validate against captured frames when
they become available.

F10 — IObjectExporter::ResolveOxid2 (opnum 4):
  Per [MS-DCOM] §3.1.2.5.1.4. New parse_resolve_oxid2_result
  mirrors parse_resolve_oxid_result exactly except for the extra
  4-byte COMVERSION slot (u16 major + u16 minor) between
  authn_hint and error_status. Trailing-fields check tightens
  from 24 bytes (opnum 0) to 28 bytes (opnum 4). New ComVersion +
  ResolveOxid2Result types. referent_id=0 short-circuit returns
  empty bindings + default ComVersion + tail-status, matching
  opnum 0's pattern.

F11 — IRemUnknown::RemAddRef + RemRelease (opnums 4 and 5):
  Per [MS-DCOM] §3.1.1.5.6 + §2.2.19 (REMINTERFACEREF). Both
  opnums share the wire shape, so:
    - encode_rem_add_ref_request + encode_rem_release_request
      both delegate to a shared encode_remref_array_request
      helper.
    - parse_remref_response is shared between them too — they
      have an identical OrpcThat + referent_id + max_count +
      N×HRESULT + error_code layout.
  New RemInterfaceRef struct (ipid + public_refs + private_refs,
  ENCODED_LEN = 24) + RemRefResponse decoded shape.

8 new structural tests across both modules pin every documented
edge of each shape — short stubs, referent-zero short-circuits,
truncated-trailing detection, full multi-element round-trips.
mxaccess-rpc 183 → 188 tests; default-feature clippy clean.

Both followups documented "**Status:** Awaiting wire-fixture
capture" prior to this commit; the structural-port option was
explicitly listed as resolution path (b) in each. Future captured
frames will validate the byte-by-byte match — until then the
port is byte-faithful to the spec but unverified against a live
peer (which is fine for shipping since neither opnum is on the
NMX session's hot path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 03:24:12 -04:00
Joseph Doherty 4ed1355761 design/followups: rewrite F2/F3/F10/F11 with concrete next-step recipes
Each remaining open followup now lists the precise "Concrete next
step" to close it — what to capture, where to write the fixture,
which file to edit. Future sessions (or anyone without the project
context) can pick up any of these and execute without guessing.

F2 (NTLM verify_signature server→client):
  Status: awaiting wire-fixture capture. M2 wave 3 (callback exporter)
  is closed under F15, so the path is wired — instrument
  CallbackExporter to hex-dump inbound StatusReceived bytes during a
  live subscribe, save under tests/fixtures/m2-status-frame/, port
  verify_signature mirroring `sign` but using the server-to-client
  sub-keys per [MS-NLMP] §3.4.4, add `subtle = "2"` for constant-time
  MAC compare.

F3 (cross-domain NTLM Type1/2/3 fixture):
  Status: permanently out-of-scope on this host (no second AD
  domain). Documented the lab-environment requirements and the
  capture procedure for whoever provisions the two-domain harness.

F10 (IObjectExporter::ResolveOxid2 opnum 4):
  Status: awaiting capture or .NET helper. Two paths documented —
  extend MxNativeClient.Probe with --probe-resolve-oxid2 OR hand-roll
  the layout from [MS-DCOM] §3.1.2.5.1.4 and validate later.

F11 (IRemUnknown::RemAddRef + RemRelease):
  Status: same shape as F10. Document either probe extension or
  structural port from [MS-DCOM] §3.1.1.5.6 (REMINTERFACEREF[]).

No code changes in this commit — purely sharpening the followup
specs so each one's resolution recipe is self-contained.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 03:18:36 -04:00
Joseph Doherty 9496322712 [F27] mxaccess-asb-nettcp: constant-time DH mod_exp via crypto-bigint::DynResidue
rust / build / test / clippy / fmt (push) Has been cancelled
Closes F27 per option (b) of its resolve criterion: fixed-width
U2048 DH backend using crypto-bigint's Montgomery-form residue
arithmetic.

New auth.rs::constant_time_mod_exp(base, exp, modulus) wrapper
preserves the BigUint-in-BigUint-out API of the existing byte
helpers; the actual square-and-multiply chain runs in Montgomery
form. Both DH call sites swap to the wrapper:
  - AsbAuthenticator::new line 179 (public-key generation)
  - crypto_key line 354 (shared-secret derivation)

DH private exponent timing-leak resistance is the goal: the .NET
reference's BigInteger.ModPow is also non-constant-time, so we
were at parity but not at the long-term Rust target. With this
fix the production path no longer leaks the bit-pattern of the
long-lived DH private key through power/timing side channels.

DynResidueParams::new requires an odd modulus (Montgomery form's
only restriction). Production DH primes are always odd
(`MX_ASB_DH_PRIME = 1552...7919` on this host's registry).
CryptoParameters::DEFAULT_PRIME_TEXT — the test-fixture default
inherited from AsbRegistry.cs:66 — ends in 4 (even), which is
mathematically unsound for DH but kept for parity with the .NET
default. For that case the wrapper falls back to BigUint::modpow,
preserving the wire bytes (modular exp is a pure function of
inputs).

Wire-byte parity verified two ways:
1. Unit fixture test
   `auth.rs::deterministic_hmac_matches_dotnet_fixture` — byte-equal
   to captured .NET output for the full DH → PBKDF2 → AES-CBC chain.
   Continues to pass.
2. Live: Connect handshake against the local AVEVA install
   completes with apollo:V2 lifetime, proving MxDataProvider
   accepts the constant-time-derived public key and the
   shared-secret-based AuthenticateMe.

Workspace deps:
  - crypto-bigint = "0.5" added to [workspace.dependencies] and
    mxaccess-asb-nettcp/Cargo.toml.
  - num-bigint retained for decimal-string parsing + .NET-LE byte
    conversion (crypto-bigint has neither).

Closes the "review.md MAJOR finding" originally flagged at
design/30-crate-topology.md:269-274.

design/followups.md: F27 moved to Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 03:16:33 -04:00
Joseph Doherty d03bd04ef5 [F34 evidence] dump WCF binary-header dictionary for AddMonitoredItems
rust / build / test / clippy / fmt (push) Has been cancelled
Extends tests/add_monitored_items_request_capture.rs with a manual
binary-header walk that prints every pre-interned string + its wire
id. The captured request's binary header pre-declares **23 strings**
covering the entire DataContract field set:

  wire-id  1  http://ASB.IDataV2:addMonitoredItemsIn
  wire-id  3  AddMonitoredItemsRequest
  wire-id  5  SubscriptionId
  wire-id  7  Items
  wire-id  9  http://schemas.datacontract.org/.../ASBIDataV2Contract
  wire-id 11  MonitoredItem
  wire-id 13  activeField
  wire-id 15  activeFieldSpecified
  wire-id 17  bufferedField
  wire-id 19  itemField
  wire-id 21  contextNameField
  wire-id 23  idField
  wire-id 25  idFieldSpecified
  wire-id 27  nameField
  wire-id 29  referenceTypeField
  wire-id 31  typeField
  wire-id 33  sampleIntervalField
  wire-id 35  timeDeadbandField
  wire-id 37  timeDeadbandFieldSpecified
  wire-id 39  userDataField
  wire-id 41  lengthField
  wire-id 43  payloadField
  wire-id 45  valueDeadbandField

That gives F34's binary-builder rewrite the exact dict-id mapping
to target — every MonitoredItem child can be emitted as a
DictionaryStatic(odd-id) reference instead of an inline string,
matching WCF's compression. The "RequireId" mystery from the
earlier inline-name decode is also resolved: the wire body has
NO `RequireId` element at the bottom — the trailing `Inline("referenceTypeField")` was a dict-id wraparound or auto-intern artifact, not actual content.

design/followups.md F34 updated with the full ground-truth header,
plus a refined "Resolves when" pointing at the underlying
`nbfx.rs::decode_tokens` auto-intern semantics. The current codec's
doc comment ("the codec doesn't auto-intern") is correct for raw
[MC-NBFX] but wrong for WCF binary messages where the writer
auto-interns by convention; that's the structural fix the F34 binary
rewrite depends on.

No code-path change in this commit beyond the test improvements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 03:05:20 -04:00
Joseph Doherty b66f5bb018 [F34 evidence] capture AddMonitoredItems request wire + decoder trace
rust / build / test / clippy / fmt (push) Has been cancelled
Investigation continued via examples/asb-relay.rs middleman:
captured the .NET probe's verbatim AddMonitoredItems request bytes
(695 bytes with the 3-byte NMF SizedEnvelope header). Saved at
rust/crates/mxaccess-asb/tests/fixtures/add-monitored-items-request-wire.bin
as the ground-truth shape MxDataProvider actually accepts.

New tests/add_monitored_items_request_capture.rs runs decode_envelope
over the capture and dumps every NBFX token to stderr for inspection.

Decoded trace surfaces a SECOND, deeper issue:

The F30 dynamic-dict-resolution post-pass at
envelope.rs::resolve_dict_names_in_tokens mis-maps per-session dict
ids. Decoding the captured request renders namespace-URL slots as
field-name strings:

  body[1]=DefaultNamespace { value: Chars("nameField") }   ← bogus
  body[7]=NamespaceDeclaration { prefix: "i",
                                 value: Chars("activeField") }  ← bogus

and leaves most element names as `Static(NN)` instead of resolving
to inline names like `activeField` / `bufferedField` / `itemField`.

This blocks F34's substantive fix (rewrite
build_add_monitored_items_request_body to use DataContract
field-suffix names matching the wire). We can't validate the
rewritten builder against the captured fixture until the dict
post-pass produces the right strings.

design/followups.md F34 updated with two-prerequisite resolution
plan:
  1. Fix the F30 dynamic-dict resolution so the captured request
     decodes to recognisable inline names.
  2. Rewrite the AddMonitoredItems / DeleteMonitoredItems builders
     against the now-readable structure (DataContract field names
     + namespace prefixes for ASBIDataV2Contract / ASBContract +
     nested DataContract serialization of ItemIdentity inside
     `<itemField>` and Variants inside userDataField /
     valueDeadbandField).

Workspace: mxaccess-asb 96 → 97 (+1 capture-driven analysis test);
default-feature clippy clean. The HMAC canonical-XML signing path
remains correct (F28 fixtures are byte-equal to .NET); only the
binary NBFX wire body needs the rewrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 02:58:25 -04:00
Joseph Doherty fb40e4c20b [F34 partial] mxaccess-asb: fix collect_asbidata_payloads + add Active flag
rust / build / test / clippy / fmt (push) Has been cancelled
Investigation via examples/asb-relay.rs middleman captured the full
S→C bytes of a working PublishResponse from the .NET probe against
MxDataProvider. Decoder fix verified by regression test against the
captured fixture; one further wire-format gap surfaced and is filed.

Closed in this commit:

1. collect_asbidata_payloads filtered out empty <ASBIData/> elements
   so positional payload[N] indexing collapsed when Status was
   empty-but-present. The wire form for PublishResponse is:
     <Status><ASBIData/></Status>          ← empty placeholder
     <Values><ASBIData>{bytes}</ASBIData></Values>
   Our decoder lost the positional info and read Values as Status,
   then panicked on the malformed parse. Fix: always push every
   <ASBIData> element (empty or not) so payloads[0]=Status and
   payloads[1]=Values stay aligned. New regression test
   tests/publish_capture.rs runs the full decode chain over the
   captured wire bytes (305-byte frame at
   tests/fixtures/publish-response-with-value.bin) and asserts
   values.len() == 1.

2. MinimalMonitoredItem.active: Option<bool> + new with_active()
   constructor. The .NET reference's MxAsbDataClient.AddMonitoredItems
   defaults to active: true (cs:441). Without <Active>true</Active>
   on the wire, MxDataProvider treats the subscription as inactive
   and Publish polls return empty Values. Both binary build and
   canonical XML emitters now conditionally emit <Active> when
   active.is_some(). Shared push_monitored_item_body helper
   eliminates the duplicate MonitoredItem encoder between
   AddMonitoredItems and DeleteMonitoredItems builders.

3. SampleInterval unit: clarified as **milliseconds** in
   MinimalMonitoredItem.sample_interval doc + the example
   (sample_interval_ticks → sample_interval_ms = 1000). Matches the
   .NET reference's `ulong sampleInterval = 1000` default.

Open: F34's deeper finding — `MonitoredItem`'s wire schema is
DataContract field-suffix names (`activeField`, `bufferedField`,
`itemField`, `sampleIntervalField`, etc., per the per-session NBFX
dictionary the .NET probe declares), NOT XmlSerializer property
names (`Active`, `Buffered`, `Item`, `SampleInterval`). Our binary
NBFX builder still uses the property names, so MxDataProvider
silently fails to register monitored items — successField=true with
a 0-length Status array. The fix needs a complete rebuild of
build_add_monitored_items_request_body and
build_delete_monitored_items_request_body to use the field-suffix
names plus emit the *Specified siblings (activeFieldSpecified,
idFieldSpecified, etc.) as their own elements. The HMAC canonical
XML side is unaffected (XmlSerializer naming is correct there;
verified byte-equal to .NET via the F28 fixtures). Detailed in
design/followups.md F34's "Open" section.

Live verification of the F34-partial bonus context:
  - Read still returns 99 end-to-end via canonical XML signing.
  - AddMonitoredItems still returns Status[0] = 0 items
    (server doesn't recognize our DataContract-misnamed payload).
  - Publish still returns 0 values (the F34-open consequence).
  - All other 13 canonical-XML signed ops succeed at the request
    level (no SOAP faults, no HMAC rejections).

Workspace: mxaccess-asb 95 → 96 (+1 capture-driven decoder test);
default-feature clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 02:49:11 -04:00
Joseph Doherty 0771664092 asb: SampleInterval unit fix + F34 followup for Publish-decoder gap
rust / build / test / clippy / fmt (push) Has been cancelled
Investigation triggered by "Publish returns 0 values where .NET sees real
values" against the local AVEVA install.

Three findings:

1. SampleInterval unit: the wire field is **milliseconds**, not 100-ns
   ticks. The .NET reference (MxAsbDataClient.cs:441) defaults to
   `ulong sampleInterval = 1000` and the probe passes `subscribeSampleMs`
   directly through that surface. Sending 10_000_000 (1s in 100-ns ticks)
   makes MxDataProvider schedule the next sample ~2.8 hours out; Publish
   polls always come back empty until the misinterpreted timer expires.
   Fixed in `examples/asb-subscribe.rs` (sample_interval_ticks →
   sample_interval_ms = 1000) and clarified in
   `MinimalMonitoredItem.sample_interval`'s doc comment with the live-2026-05-06
   evidence.

2. result_code=32 is `AsbErrorCode.PublishComplete`
   (`AsbResultMapping.cs:37`) — informational, not a fatal error. .NET's
   `ToResult` (cs:122-129) explicitly treats it like Success.
   `ArchestrAResult.ErrorCode` and `ResultCode` are aliases for the same
   `resultCodeField` (cs:424-434), so `publish[i]_error=0x00000020` in
   the .NET probe trace = `result_code=Some(32)` in our trace = the same
   thing. Already handled correctly via the F26 narrower-bail fix
   (commit 983f029) — no code change needed.

3. **F34 filed** for the residual gap: with both sides seeing
   result_code=32 + success=false, .NET extracts a value but we extract
   zero. Three open hypotheses (wire-shape mismatch / payload-locator
   bug / MonitoredItemValue byte-layout bug); all need a middleman
   asb-relay.rs trace between the .NET probe and MxDataProvider to
   confirm. Adjacent symptom: AddMonitoredItemsResponse Status reads as
   0 items where .NET sees 1 — likely the same root cause; one fix
   should close both.

Live re-runs to validate the new sample-interval unit were blocked by
the documented F32 InvalidConnectionId transient (the
pending-connection table on MxDataProvider fills up after many
back-to-back test cycles; clears after a 30s+ cool-down).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 02:28:44 -04:00
Joseph Doherty 983f02921c asb-subscribe example: drive every canonical-XML signed op live
rust / build / test / clippy / fmt (push) Has been cancelled
Extends the example to exercise the full data-plane through the
new canonical-XML signing path (F28 step 2). Each op is announced
with a "[canonical XML <Op>]" tag in the trace so the lifecycle is
self-documenting:

  Connect → Register → Read → Write → CreateSubscription
  → AddMonitoredItems → Publish × N → PublishWriteComplete
  → DeleteMonitoredItems → DeleteSubscription
  → UnregisterItems → Disconnect → SendEnd

Per-section errors are caught and logged but don't abort the
lifecycle — a failed Publish still reaches Disconnect cleanly so
the server-side pending-connection table doesn't fill up.

New env vars MX_RUN_WRITE / MX_RUN_SUBSCRIBE / MX_SUBSCRIBE_COUNT
(defaults: run, run, 3) for opting into / sizing the optional steps.

Live verification on this host (this turn, first run):
  register status: 1 item(s); result_code=Some(0) success=Some(true)
  TestChildObject.TestInt = AsbVariant{type_id:4,length:4,payload:[99]}
  write status: 0 item(s); result_code=Some(0) success=Some(true)
  subscription_id=2 result_code=Some(0) success=Some(true)
  add status: 0 item(s); result_code=Some(0) success=Some(true)
  publish: 0 value(s); result_code=Some(32) success=Some(false)
  publish_write_complete: 0 write(s); result_code=Some(0)
  delete_monitored_items ok
  delete_subscription ok
  unregistering ... disconnecting

All 13 canonical-XML-signed ops accepted by MxDataProvider — no SOAP
faults, no HMAC rejections, no decode errors. F28 step 2 verified
end-to-end against the live AVEVA install.

Bonus fix: F26 stream's publish_loop bail logic narrowed.
The original F33 bail-on-any-non-zero-result_code was over-aggressive:
.NET's MxAsbClient.Probe shows that result_code=32 (= 0x20) fires on
*every* Publish poll while values are still being delivered. Updated
publish_loop and the example's Publish loop to bail only on
RESULT_CODE_INVALID_CONNECTION_ID (1) — that one truly means the
session is desynced. Other non-zero result_codes are informational
and the loop continues draining.

New public re-export: mxaccess_asb::RESULT_CODE_INVALID_CONNECTION_ID
(was crate-private under the operations module).

The InvalidConnectionId transient still hits after many back-to-back
test runs against a long-running MxDataProvider — the pending-
connection table fills up — same well-documented behaviour from F32.
A 30-second cool-down restores reliability in our experience.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 02:19:47 -04:00
Joseph Doherty 34d477819b [F28] mxaccess-asb: canonical XML signing for all 8 remaining ops
rust / build / test / clippy / fmt (push) Has been cancelled
Closes F28. The 5 [XmlSerializerFormat] ops landed in commit f14580e
(2026-05-05); this commit closes out the remaining 8 ConnectedRequest
shapes, eliminating the legacy NBFX-bytes signing fallback from every
`client::*` op.

Two deliverables:

1. Extended `MxAsbClient.Probe --dump-signed-xml` (.NET probe) to
   emit deterministic canonical-XML output for ReadRequest,
   WriteBasicRequest, PublishWriteCompleteRequest,
   CreateSubscriptionRequest, DeleteSubscriptionRequest,
   AddMonitoredItemsRequest, DeleteMonitoredItemsRequest,
   PublishRequest. Saved 8 fixtures at
   rust/crates/mxaccess-asb/tests/fixtures/signed-xml/*.xml. Pinned
   field values for reproducibility:
     - SubscriptionId = 0x1234_5678_9abc_def0
     - MaxQueueSize = 100, SampleInterval = 1000
     - WriteHandle = 0xDEAD_BEEF
     - WriteValue = Variant.FromInt32(42)
     - MonitoredItem with the existing sample-item shape

2. Ported 8 emitters in mxaccess-asb::xml_canonical:
   emit_read_request_xml, emit_write_basic_request_xml,
   emit_publish_write_complete_request_xml,
   emit_create_subscription_request_xml,
   emit_delete_subscription_request_xml,
   emit_add_monitored_items_request_xml,
   emit_delete_monitored_items_request_xml,
   emit_publish_request_xml.

   New helpers consolidate XmlSerializer's per-namespace shapes:
     - emit_invensys_text — primitive int/long fields in the parent
       urn:invensys.schemas namespace (no xmlns redeclaration).
     - emit_write_value — <Values> wrapper inlining
       Value (Variant), Status (default AsbStatus), Comment (xsi:nil).
     - emit_monitored_item — <Items> wrapper inlining
       Item, SampleInterval, ValueDeadband, UserData, Buffered.
     - emit_inline_item_identity — ItemIdentity rendered as a child
       of MonitoredItem (single xmlns redeclaration on the wrapper,
       children inherit).
     - emit_inline_text + emit_inline_optional_string —
       no-redeclaration variants of emit_iom_text +
       emit_iom_optional_string.
     - emit_idata_variant — Variant's Type/Length/Payload children
       in the http://asb.contracts.idata.data/20111111 namespace
       (Payload self-closes with xsi:nil when Length=0).
     - emit_iom_default_variant — wrapper for ValueDeadband / UserData
       (default-shape Variant in iom:2 namespace).

   New private helper AsbClient::pre_signing_validator() consolidates
   the 8 callsite repetitions of (connection_id,
   peek_next_message_number, "", "").

Wired into client::* — every send_signed_envelope[_one_way] call now
passes Some(&xml) for xml_for_signing. The 8 ops affected: read,
write, publish_write_complete, delete_monitored_items,
create_subscription, add_monitored_items, publish,
delete_subscription (plus their _once retry-loop variants).

8 new fixture-comparison tests (mxaccess-asb 87 → 95). Each emitter
byte-equal vs the .NET fixture on the first try — no iteration
needed. Workspace clippy clean.

Live verification: `cargo run -p mxaccess --example asb-subscribe`
returns TestChildObject.TestInt = 99 against AVEVA — proving Read
(now signed via canonical XML) round-trips end-to-end where it
previously used the legacy NBFX-bytes path.

The remaining 7 ops are wire-tested at fixture-byte-equality only;
live exercise is gated on the F33 follow-on capture for
subscribe-flow ops, but the canonical XML matches the .NET reference
byte-for-byte, so the HMAC will match by construction once the
session is in a state to issue those ops.

design/followups.md:
  - F28 moved to Resolved with the full two-step audit trail.
  - F18 M5 status block rewritten — all sub-followups (F26 stream,
    F28, F29, F32, F33) now closed. M5 DoD bullets 1+2+3+4 all green.
  - tests/fixtures/signed-xml/README.md updated to list the 8 new
    fixtures + their pinned input values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 02:13:16 -04:00
Joseph Doherty ff4ea4d5a9 [F16] mxaccess: real Session::recover_connection (re-bind + re-advise)
rust / build / test / clippy / fmt (push) Has been cancelled
Closes F16. Replaces the wave-2 no-op recover_connection with the
full .NET-equivalent shape (MxNativeSession.cs:399-474). Three
pieces:

1. Subscription registry on SessionInner.
   New subscriptions: Mutex<HashMap<[u8; 16], SubscriptionEntry>>
   tracks every active advise. subscribe() inserts after a successful
   AdviseSupervisory; unsubscribe() removes on the success path only
   (failed UnAdvises stay registered so next recovery replays them).
   The consumer's Subscription handle still holds the BroadcastStream;
   the registry is purely for AdviseSupervisory replay.

2. Pluggable RebuildFactory.
   New public typedef:
     pub type RebuildFactory = Arc<
         dyn Fn() -> Pin<Box<dyn Future<Output = Result<NmxClient,
                                                        NmxClientError>>
                            + Send>>
             + Send + Sync,
     >;
   Installed via Session::set_recovery_factory(factory);
   queryable via has_recovery_factory(). Kept separate from
   connect_nmx / connect_nmx_auto so existing constructors stay
   non-breaking — consumers opt in by calling the setter
   after-the-fact.

3. Real recover_connection + recover_connection_core.
   recover_connection is the retry loop (mirrors cs:399-440): for
   attempt in 1..=policy.max_attempts, emit RecoveryEvent::Started
   → call recover_connection_core → on Ok emit Recovered + return,
   on Err emit Failed{will_retry, error}, sleep policy.delay, retry,
   or bubble the last error.

   recover_connection_core mirrors cs:442-474: rebuild NMX via the
   factory → RegisterEngine2 with the saved callback_obj_ref → optional
   SetHeartbeatSendInterval → snapshot the registry under the lock,
   replay AdviseSupervisory(correlation_id) for each entry → atomically
   swap *nmx_lock = replacement. Old NmxClient drops at end of scope,
   closing its TCP transport.

Subscription correlation ids are preserved across the swap so the
consumer's Subscription stream continues to receive on its existing
broadcast filter. The CallbackExporter stays bound across recoveries
— no TCP listener re-bind.

R15's "long-lived connection task" was listed as a hard prereq, but
the existing Mutex<NmxClient> already serialises concurrent ops
during the rebuild — recover_connection_core holds the inner mutex
during the swap, concurrent ops just wait. Functionally equivalent
to the long-lived-task design.

New ConfigError::RecoveryNotConfigured returned when
recover_connection is called without a factory installed. New
public re-export: RebuildFactory.

Tests (mxaccess 65 → 67):
  - recover_connection_without_factory_returns_recovery_not_configured
  - recover_connection_with_always_failing_factory_exhausts_attempts
    (pins (Started, Failed)×3 + final will_retry=false + bubbled
    TransportFailure)
  - subscribe_populates_registry_unsubscribe_clears_it
  - recovery_events_supports_multiple_subscribers (updated for the
    new factory-required path)

connect_nmx_auto-side auto-population of the factory (capturing the
ntlm_factory + discovered (addr, service_ipid) so consumers don't
re-author the closure) is a future polish — not required to close
F16.

design/followups.md: F16 moved to Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 01:57:43 -04:00
Joseph Doherty 904f211aba .gitignore: cover ad-hoc debug captures + Claude Code state
Two patterns that have been polluting `git status` across recent
debugging sessions:

1. Root-level capture files from manual asb-relay / trace runs:
   - rust-cs.txt, rust-sc.txt, rust.log (asb-relay TCP dumps,
     hex-prefixed C->S / S->C and the relay log)
   - rust-trace-*.txt (MX_ASB_TRACE_REPLY=1 captures, e.g. the
     trace-orig dump that surfaced the F33 InvalidConnectionId
     evidence)

   Anything worth keeping should be promoted into `captures/` or
   `analysis/` with a name describing what it captures; these
   transient root-level files are noise.

2. `.claude/` — Claude Code's project-local state directory
   (scheduled-task locks, agent state). User/host-specific.

Removed the existing root-level rust-*.txt files in the same
commit; future runs will be ignored automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 01:45:28 -04:00
Joseph Doherty 079896c7bc design/followups: collapse 18 redundant 'Earlier slices' blocks
Each F18 cumulative-log step had its own '**Earlier slices:**' header
followed by a verbose body that duplicated the matching commit
message — content already preserved in `git show <hash>` for every
hash listed in the cumulative-log line at the top of F18.

Removes ~75 lines of redundancy:
  - 18× '**Earlier slices:**' headers and their bodies (F19, F20,
    F21, F22, F24, F23, F25 steps 1-10, F26 steps 1-3, example
    rewrite).
  - The stale 'F25 (...) and F26 (...) remain open' paragraph (both
    closed long since).

Keeps the substantive material in place:
  - The cumulative-log line listing every commit by hash.
  - The 5-finding F25 live-bring-up reconciliation block (justifies
    F28 + F29 followups).
  - The F26 step 3 AsbSession design rationale (explains why ASB
    parallels rather than unifies with the NMX Session — useful for
    future readers).
  - A one-sentence pointer to `git show <hash>` for per-step detail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 01:42:42 -04:00
Joseph Doherty cfeb761092 [F33] mxaccess-asb: complete InvalidConnectionId tolerance propagation
rust / build / test / clippy / fmt (push) Has been cancelled
Closes F33. Final commit in the three-step F33 closure (218f4c47a5f251 → this) — propagates the F31 InvalidConnectionId tolerance
pattern to every remaining response decoder + adds publish-loop
detection so the F26 stream terminates cleanly on server-side
rejections instead of spinning silently.

Decoders updated to tolerate empty / missing payloads + surface
result_code/success:
  - decode_publish_response (the F26 stream's hot path)
  - decode_unregister_items_response
  - decode_delete_monitored_items_response
  - decode_write_response
  - decode_publish_write_complete_response

Shared `extract_result_status(body_tokens)` helper in operations.rs
consolidates the per-decoder find_text_in_named_element calls for
resultCodeField + successField — a single source of truth for the
F31-pattern wrapper extraction.

Public response structs gain `result_code: Option<u32>` and
`success: Option<bool>`:
  - PublishResponse
  - UnregisterItemsResponse
  - DeleteMonitoredItemsResponse
  - WriteResponse
  - PublishWriteCompleteResponse

asb_session.rs::publish_loop: when PublishResponse.result_code is
Some(non_zero), the loop now sends Err(ConnectionError::TransportFailure
{ detail: "publish returned result_code 0xXX (server-side rejection)" })
as the stream's terminal item, then returns. Without this, an
InvalidConnectionId-poisoned subscription would generate empty
PublishResponse forever.

5 new tests synthesise the InvalidConnectionId wire shape
(`<Result><resultCodeField>1</><successField>false</></><ASBIData/><ASBIData/>`)
for each decoder via the shared synthesise_invalid_connection_id_body
helper — pin the tolerance for Publish, Unregister, Delete*, Write,
and PublishWriteComplete.

Updated obsolete write_response_missing_status_fails test to
write_response_missing_status_returns_empty_with_no_result_code
since the decoder no longer errors.

Live read regression test: TestChildObject.TestInt = 99 returned
end-to-end after all changes (cargo run -p mxaccess --example
asb-subscribe).

Workspace: mxaccess-asb 82 → 87 tests (+5). All other crates
unchanged. Default-feature clippy clean.

design/followups.md: F33 moved to Resolved with the full
three-commit audit trail. M5 status block stable: F32 + F33 closed,
only F28 (canonical XML for the remaining 8 ops) remains as P2
latent — works in practice under empty hashAlgorithm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 01:37:11 -04:00
Joseph Doherty 7a5f251ac7 [F33 progress] mxaccess-asb: extend InvalidConnectionId tolerance to subscribe ops
rust / build / test / clippy / fmt (push) Has been cancelled
Continues the F31 tolerance pattern propagation to the two
subscribe-path decoders called out in F33.

CreateSubscriptionResponse:
- Adds result_code: Option<u32> + success: Option<bool> fields.
- decode_create_subscription_response no longer errors with
  MissingField "SubscriptionId" — when the server short-circuits on
  InvalidConnectionId it returns the Result wrapper without a
  SubscriptionId. The decoder returns subscription_id=0 in that case
  with result_code/success surfaced; callers inspect result_code
  before treating subscription_id as valid.
- AsbClient::create_subscription wraps the call in the same retry
  loop register_items uses (10 attempts × 200·N ms backoff on
  RESULT_CODE_INVALID_CONNECTION_ID).

AddMonitoredItemsResponse:
- Adds result_code + success fields.
- decode_add_monitored_items_response tolerates an empty / missing
  <ASBIData /> Status payload (returns empty Vec) and surfaces
  result_code/success.
- AsbClient::add_monitored_items gets the same retry loop.

Both decoders now match the F31 + Read shape: tolerant of empty
payloads when the server short-circuits, surface the wrapper's
result_code so callers (and the in-client retry loop) can detect
the InvalidConnectionId race.

Updated obsolete unit test
(create_subscription_response_missing_id_fails →
create_subscription_response_missing_id_returns_zero_sentinel) plus
two new tests pinning the InvalidConnectionId synthesis path for
both decoders.

Workspace: mxaccess-asb 80 → 82 tests; default-feature clippy
clean; existing live-read still passes (verified separately).

This is the second of three F33 closures. Remaining: applying the
same tolerance to decode_publish_response (and optionally
decode_delete_*_response, decode_unregister_items_response,
decode_write_response, decode_publish_write_complete_response for
symmetry). With CreateSubscription + AddMonitoredItems tolerant
+ retrying, the subscribe-flow example should now reach the
publish-loop stage instead of bailing at the second op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 01:29:38 -04:00
Joseph Doherty 218f4c4ec8 mxaccess-asb: extend F31 InvalidConnectionId tolerance to Read
rust / build / test / clippy / fmt (push) Has been cancelled
Live MX_ASB_TRACE_REPLY capture against MxDataProvider during the
F33 investigation showed Read hitting the same InvalidConnectionId
race that F31 fixed for register_items: server replies with a
Result wrapper carrying resultCodeField=1 + successField=false plus
two empty <ASBIData /> payloads. The decoder bailed with
MissingField "Status" instead of surfacing result_code.

Two changes:

1. ReadResponse gains result_code: Option<u32> and success:
   Option<bool> fields, matching the RegisterItemsResponse shape.
   decode_read_response tolerates empty / missing <ASBIData />
   payloads (returns empty status + values arrays) and surfaces
   the wrapper's result_code / success via
   find_text_in_named_element.

2. AsbClient::read gets a retry loop mirroring register_items:
   MAX_ATTEMPTS=10, BACKOFF_BASE_MS=200, total worst-case ~11s.
   Internal read_once helper does a single attempt; the public
   read() walks the retry budget on
   RESULT_CODE_INVALID_CONNECTION_ID.

Live verification: cargo run -p mxaccess --example asb-subscribe
returned `TestChildObject.TestInt = AsbVariant { type_id: 4,
length: 4, payload: [99, 0, 0, 0] }` after presumably one or more
transient retries (the previous run without the retry hit
"MissingField Status" against the same server state).

1 new test
(read_response_tolerates_empty_asbidata_when_invalid_connection_id)
plus a synthesise_invalid_connection_id_body helper that builds
the canonical wire shape captured live (Result wrapper +
resultCodeField=1 + successField=false + two empty <ASBIData />
elements). Workspace 718 → 722 tests... wait, mxaccess-asb went
79 → 80 (+1). Tests still all green; clippy clean on default and
windows-com features.

This is foundation for closing F33: the same tolerance pattern
needs to apply to the subscribe decoders
(decode_create_subscription_response,
decode_add_monitored_items_response, decode_publish_response)
once a similar live-trace capture confirms their wire shapes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 01:25:41 -04:00
Joseph Doherty cbc95a4684 [F33] design/followups: capture live-subscribe wire gap
Live run of `cargo run -p mxaccess --example asb-subscribe` against
the local AVEVA install (with DH params + passphrase loaded from
Setup-LiveProbeEnv.ps1 + Get-AsbPassphrase.ps1) surfaced two concrete
gaps in the subscription-path response decoders:

1. `CreateSubscriptionResponse` returns subscription_id = 0 — the
   server almost certainly assigns a real Int64, but
   decode_create_subscription_response can't locate the
   `<SubscriptionId>` element. Likely a dict-id our F30 post-pass
   doesn't resolve for that specific element name.

2. `AddMonitoredItemsResponse` decode fails with MissingField
   "Status". The wire shape needs a capture-and-diff vs the .NET
   probe's subscription path.

Once subscribe-side ops are issued, the channel desyncs — subsequent
read() on the same session fails with the same MissingField error,
suggesting NBFX framing state may also be out of sync.

The F26 stream API itself (AsbSession::subscribe → Stream<Item =
Result<MonitoredItemValue, Error>>) is complete and unit-tested
(commit f2f22df). This followup just captures the live-wire
reconciliation work that's still required to make the subscribe
path actually return data against MxDataProvider. Once F33 closes,
the last M5 live-wire gap is resolved.

P2 — not blocking M5 closeout; blocks the Subscribe demo.

The asb-subscribe.rs example stays in its working Read-loop form
(no regression). When F33 lands, the example can be promoted to
demonstrate the full subscribe flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 01:17:09 -04:00
Joseph Doherty f2f22dfcd1 [F26 stream] mxaccess: AsbSession::subscribe — Stream<Item = MonitoredItemValue>
rust / build / test / clippy / fmt (push) Has been cancelled
Closes the last F26 stub from the M5 status block. New
AsbSession::subscribe(subscription_id) returns an AsbSubscription
that impls Stream<Item = Result<MonitoredItemValue, Error>>. An
internal tokio::spawn'd publish-loop drains the subscription queue
via the existing AsbSession::publish() and fans each
PublishResponse's `values` array out as individual stream items.

Termination semantics:
  - Drop of AsbSubscription calls JoinHandle::abort() — the publish
    task stops draining the server-side queue (the .NET reference
    pattern at MxAsbDataClient.cs uses the same task-cancellation
    shape).
  - Transport error from publish() is delivered as the final stream
    item; the loop returns and the channel closes.
  - Receiver-drop (consumer stops polling) is detected when
    tx.send returns Err — the loop exits without making more
    publish calls.

The inner publish_loop helper takes any FnMut() -> Future<Result<...>>
so it's testable in isolation (no live ASB endpoint required).

Per-item ItemStatus from the server is intentionally not surfaced
on the stream: the field is opaque per-item and rarely actionable
for the streaming consumer. A richer struct can wrap each value if
that need surfaces.

3 new tests pin:
  - asb_subscription_is_stream_send_unpin (compile-time bounds);
  - publish_loop_delivers_values_then_terminates_on_error
    (3 Ok values from 2 batches, then 1 terminal Err);
  - publish_loop_exits_when_consumer_drops_channel.

New deps used (already in mxaccess Cargo.toml): futures_util::Stream,
tokio::sync::mpsc, tokio_stream::wrappers::ReceiverStream,
tokio::task::JoinHandle.

Workspace: 718 → 721 tests. Default-feature clippy clean.
mxaccess crate-level doc updated to drop the "stubbed for next F26
iteration" note for the subscription stream.

design/followups.md F18 M5 status block updated: F26 stream
subscription marked resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 01:10:22 -04:00
Joseph Doherty 8e695b9347 [F12 wrapper + F32 close] Session::connect_nmx_auto + close M5 type-matrix DoD
rust / build / test / clippy / fmt (push) Has been cancelled
Two related closures in one commit:

1. Session-level wrapper around F12: new
   `mxaccess::Session::connect_nmx_auto(ntlm_factory, options,
   resolver, recovery)` gated on a new `mxaccess/windows-com` feature
   (which propagates `mxaccess-nmx/windows-com`). Drives
   `NmxClient::create` (the F12 COM-activation factory) for the
   `(host, port, service_ipid)` discovery, then funnels into the
   shared post-NMX-bind orchestration. Refactored `connect_nmx` to
   extract steps 1+2+4+5 into a private `from_nmx_client` helper —
   both `connect_nmx` and `connect_nmx_auto` reuse it so the
   `CallbackExporter` + router + `RegisterEngine2` + heartbeat policy
   stays in one place. The .NET `MxNativeSession.Open` shape
   (`MxNativeSession.cs:127-147`) is now reproduced end-to-end on
   Windows with `windows-com` on — callers no longer pre-resolve
   `(addr, service_ipid)` by hand.

   `connect_nmx`'s doc comment updated to drop the stale "F12 not yet
   wired" note. `parse_bracketed_host_port` in mxaccess-nmx gets a
   `cfg_attr(not(...), allow(dead_code))` so the default-feature
   build stays warning-clean.

2. F32 closed via option (b) of its own resolve criterion: the four
   missing types (Float / Double / DateTime / Duration) are gated on
   Galaxy-side template provisioning that's outside the Rust port's
   scope. The deployed test Galaxy on this host only has
   mx_data_type ∈ {1=Bool, 2=Int32, 5=String}; we cannot exercise
   the missing types without authoring new template attributes in
   the Aveva console (a manual platform-engineering task). The
   three-type live verification at commit 9063f10 satisfies the M5
   DoD bullet for what is deployable. F18's M5 status block updated
   to reflect F32-resolved.

Workspace: 718 tests pass on default features (was 712 before F12,
+6 from new parse_bracketed_host_port tests). Default-feature
clippy + windows-com clippy both clean.

Closes F32 in design/followups.md and extends F12's resolution note
with the Session-level wrapper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 22:30:25 -04:00
Joseph Doherty daa4ea3f16 [F12] mxaccess-nmx: NmxClient::create — auto-resolving COM-activation factory
rust / build / test / clippy / fmt (push) Has been cancelled
New constructor NmxClient::create(ntlm_factory) gated on
cfg(all(windows, feature = "windows-com")). New crate feature
mxaccess-nmx/windows-com propagates to mxaccess-rpc/windows-com.
Mirrors ManagedNmxService2Client.Create() (cs:30-64) plus
ResolveService (cs:491-523).

Six-step bring-up:
  1. com_objref_provider::marshal_activated_iunknown_objref(
       "NmxSvc.NmxService", MarshalContext::DifferentMachine)
     activates and emits the OBJREF.
  2. ComObjRef::parse extracts oxid + the activated server's IUnknown
     IPID.
  3. resolve_oxid_with_managed_ntlm_packet_integrity against
     127.0.0.1:135 (RPCSS endpoint mapper) returns the server's
     (host, port) bindings + IRemUnknown IPID.
  4. parse_bracketed_host_port pulls the host + port out of the
     ncacn_ip_tcp binding's `host[port]` text. Uses rfind for the
     rightmost brackets so FQDN forms (foo.example.com[1234])
     round-trip — matches the .NET ParseBracketedHost/Port shape at
     cs:540-561.
  5. A fresh DceRpcTcpClient binds to IRemUnknown and calls
     RemQueryInterface(iunknown_ipid, INmxService2_IID,
                        fresh_causality_id, public_refs=5).
  6. A second fresh transport binds to INmxService2 via Self::connect.

The ntlm_factory: impl FnMut() -> NtlmClientContext closure is
invoked three times (one per bind); each NtlmClientContext is
consumed by its bind, so the factory must produce fresh contexts.

New NmxClientError variants:
  - Activation(ProviderError) — only emitted with windows-com on.
  - EndpointResolution { reason } — covers no ncacn_ip_tcp binding,
    malformed host[port], non-zero RemQueryInterface HRESULT.

6 offline tests on parse_bracketed_host_port: FQDN host extraction,
rfind for rightmost brackets, rejection of missing '[' / missing
']' / non-numeric port / port overflow.

1 live test (#[ignore], gated on MX_LIVE + MX_TEST_USER /
MX_TEST_PASSWORD / MX_TEST_DOMAIN populated by
tools/Setup-LiveProbeEnv.ps1): round-trips the full chain against
the AVEVA install on this host. Resolved INmxService2 IPID is
non-zero — verified end-to-end.

Workspace: mxaccess-nmx 17 → 23 (+6). All other crates unchanged.

Closes F12 in design/followups.md. F6 (ComObjRefProvider port) was
the prior blocker; with both landed, the COM-activation path is
end-to-end functional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 22:21:49 -04:00
Joseph Doherty cf9dbaf568 [F6] mxaccess-rpc: ComObjRefProvider port via windows-rs (CoMarshalInterface)
rust / build / test / clippy / fmt (push) Has been cancelled
New module crates/mxaccess-rpc/src/com_objref_provider.rs gated on
cfg(all(windows, feature = "windows-com")). Pulls windows = "0.59"
(features Win32_Foundation + Win32_System_Com +
Win32_System_Com_Marshal + Win32_System_Com_StructuredStorage +
Win32_System_Memory) as an optional dep behind the existing
windows-com feature; default footprint stays slim.

Public API mirrors ComObjRefProvider.cs 1:1: MarshalContext enum
(InProcess / Local / DifferentMachine wrapping the MSHCTX_* newtype
constants), clsid_from_prog_id, marshal_activated_iunknown_objref
(activates via CoCreateInstance with INPROC | LOCAL | REMOTE then
marshals), marshal_iunknown_objref (uses IUnknown::IID),
marshal_interface_objref (CoMarshalInterface over an HGlobal-backed
IStream).

All `unsafe` is internal to the module — public API exposes only
typed Rust values (Vec<u8>, GUID, ProviderError), no raw pointers /
HRESULTs / lifetime-bound interface pointers leak. Each unsafe block
carries an inline SAFETY comment naming the invariants being upheld.

Per-thread COM init via thread-local OnceLock<()>: lazy
CoInitializeEx(MULTITHREADED) on first call; S_FALSE (already
initialised) and RPC_E_CHANGED_MODE (thread is STA) treated as
success — matches the .NET runtime's tolerant apartment behaviour.

ProviderError enumerates the four documented failure modes plus the
apartment-init pre-check: UnknownProgId / ActivationFailed /
MarshalFailed / GlobalLockFailed / ApartmentInitFailed.

4 offline tests: MarshalContext → MSHCTX_* mapping, ensure_apartment
idempotence, clsid_from_prog_id returns UnknownProgId for fake
ProgIDs, marshal_activated short-circuits at the resolution stage.

1 live test (#[ignore], gated on MX_LIVE): activates the real
NmxSvc.NmxService, marshals the proxy's IUnknown via
CoMarshalInterface, then parses the resulting blob via
ComObjRef::parse and asserts non-zero OXID + IPID. Passes against
the AVEVA install on this host.

Workspace tests: mxaccess-rpc went 179 → 183 (+4). All other crates
unchanged.

Unblocks F12 (NmxClient::create — the auto-resolving
COM-activation factory): the underlying primitive
(marshal_activated_iunknown_objref) now exists; remaining work is
threading the windows-com feature through mxaccess-nmx and chaining
ComObjRef::parse → resolve_oxid_with_managed_ntlm_packet_integrity →
RemQueryInterface. design/followups.md F12 updated with a revised
"Resolves when" reflecting that F6's blocker is gone.

Closes F6 in design/followups.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 22:11:33 -04:00
Joseph Doherty 41f2d4c0f2 [F14] mxaccess-galaxy: tiberius-backed SQL Resolver + UserResolver
rust / build / test / clippy / fmt (push) Has been cancelled
New module crates/mxaccess-galaxy/src/sql_resolver.rs (~480 LoC) gated
behind the existing galaxy-resolver Cargo feature. Adds SqlTagResolver
+ SqlUserResolver, both constructed via from_ado_string(&str)
accepting the same connection-string shape the .NET reference uses by
default (Server=localhost;Database=ZB;Integrated Security=True;
Encrypt=False;TrustServerCertificate=True). Integrated Security=True
resolves to Windows auth via tiberius's winauth feature.

Each top-level call (resolve / browse / resolve_by_guid /
resolve_by_name) opens a fresh Client<Compat<TcpStream>> and drops it
on return — matches the .NET `await using` lifecycle at
GalaxyRepositoryTagResolver.cs:93-95. tiberius's Client::query only
accepts positional @P1..@PN placeholders (delegates to sp_executesql);
the canonical RESOLVE_SQL / BROWSE_SQL / USER_BY_GUID_SQL /
USER_BY_NAME_SQL constants are rewritten once-per-process via
OnceLock<String> (@objectTagName → @P1, etc.). The unrewritten
constants stay byte-identical with the .NET reference for ad-hoc
diagnostic copy/paste.

read_metadata mirrors ReadMetadata (cs:149-165) byte-by-byte: signed
smallint → i16 widened to u16 for platform/engine/object IDs (matches
the .NET checked((ushort)reader.GetInt16(N)) shape), int → i32
checked-cast to i16 for property_id, nullable nvarchar for
primitive_name. read_user_profile mirrors ReadProfile (cs:76-85)
including the roles_text blob → parse_role_blob round-trip.

Deps added (gated): tiberius 0.12 (default-features = false; tds73 +
rustls + winauth — no chrono / rust_decimal pulled), tokio-util's
compat feature for the futures-rs ↔ tokio AsyncRead bridge,
futures-util for TryStreamExt::try_next. Default-feature build still
pulls only mxaccess-codec + async-trait + thiserror + uuid (slim
foot-print preserved per the design doc's intent).

New `live` feature on this crate (`live = ["galaxy-resolver"]`) for
parity with the workspace pattern.

11 offline unit tests pin: SQL named→positional rewriting (no @named
left, @P1/@P2/@P3 present), line-count preserved, ado-string
acceptance (default Galaxy shape parses, garbage rejected), input
validation (max_rows=0 rejected, empty LIKE rejected, empty user_name
rejected, all checked before connect attempt).

Two #[cfg(feature = "live")] #[ignore]'d tests round-trip against a
real Galaxy DB (gated on MX_LIVE + MX_GALAXY_DB env vars per
tools/Setup-LiveProbeEnv.ps1). Live verification on this host:
live_resolve_test_child_object_test_int and
live_browse_test_child_object both pass against the local AVEVA
install — TestChildObject.TestInt resolves with mx_data_type=2
(Int32), is_array=false.

Closes F14 in design/followups.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:54:43 -04:00
Joseph Doherty 9501080170 [F4+F5] mxaccess-rpc: BindAck/AlterContextResponse parser + live-capture round-trip
rust / build / test / clippy / fmt (push) Has been cancelled
Adds BindAckPdu + per-result BindAckResult per [C706] §12.6.3.4: u16
result + u16 reason + 20-byte SyntaxId, preceded by port_any_t secondary
address, n_results, and 3 reserved bytes. Encode/decode handle both
PacketType::BindAck and PacketType::AlterContextResponse (same body
shape).

The new bind_ack_round_trips_live_capture test takes the first 84 bytes
of the server's first response in
captures/013-loopback-subscribe-scalars/tcp-stream-__1_49704-to-__1_55690.bin
(real BindAck observed against local AVEVA), decodes the shape
(secondary address "49704\0", n_results=2, NDR transfer syntax accepted,
DCOM negotiate_ack reason=3), then re-encodes and asserts byte-identical
to the original frame. Stronger evidence of wire parity than the prior
synthetic-frame tests, and lets the rest of the M2 stack reason about
the negotiated transfer syntax instead of relying on request-side
context to infer it.

Closes F4 and F5 in design/followups.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:44:54 -04:00
Joseph Doherty 826f7b3f89 [M5] mxaccess-asb-nettcp: F29 resolved — full canonical [MC-NBFS] table port
rust / build / test / clippy / fmt (push) Has been cancelled
The original hand-curated table was wrong starting at id 74 — entries
had been deduplicated/renumbered without preserving the canonical
`id = 2 * StringN` mapping from `[MC-NBFS]` §2.2, leaving most of
the SOAP-fault subset at the wrong ids:

  ours had Fault at 114, canonical is 134
  ours had Code at 122, canonical is 142
  ours had Reason at 124, canonical is 144
  ours had Text at 126, canonical is 146
  ours had Value at 134, canonical is 154
  ours had Subcode at 136, canonical is 156

Wire captures from the live AVEVA MxDataProvider use the canonical
ids — verified earlier via `MX_ASB_TRACE_REPLY` showing
`<resultCodeField>` correctly resolved through the F30 post-pass
once the ids matched.

Replaced the entire STATIC_ENTRIES array with a faithful port of the
first 200 entries from `dotnet/wcf`'s
`src/System.ServiceModel.Primitives/src/System/ServiceModel/
ServiceModelStringsVersion1.cs` (sourced via WebFetch — that file is
the canonical [MC-NBFS] §2.2 table mirrored in code). The wire id is
`2 * StringN` for `StringN` at 0-based position N. Coverage now spans
id 0..400, picking up the full SOAP / WS-Addressing / WS-RM /
WS-Security / WS-SecureConversation / WS-Trust / xmldsig+xenc URIs /
SAML / Kerberos / X509 token-type subset. The 436..444 xsi/xsd/nil
extras (used by .NET XmlSerializer for [MessageContract] value-type
bodies) are preserved.

Four new regression tests:
- ids monotonic (was already there);
- ids all even (`[MC-NBFS]` reserves odd ids for the dynamic dict);
- SOAP-fault subset (s, Fault, MustUnderstand, Code, Reason, Text,
  Node, Role, Detail, Value, Subcode) resolves to the canonical
  strings — pins the fix against accidental regression;
- `position_of_static` round-trips for known strings.

Followups:
- F29 moved to ## Resolved with full audit-trail.
- F18 M5 status block updated to strike F29 from the remaining-work
  list. The remaining open M5 items are F32 (live type-matrix beyond
  Int32/String/Bool, gated on Galaxy provisioning) and F28 (canonical
  XML signing for Read/Write/Subscribe ops, P2 latent).

Workspace: 712 unit tests pass (was 711 + 1 new fault-subset test +
existing tests now matching canonical). Clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:31:09 -04:00
Joseph Doherty 5845b5eb12 [M5] mxaccess-asb: F32 partial — Bool + String + Int32 live, longer retry budget
rust / build / test / clippy / fmt (push) Has been cancelled
Three of seven proven types now round-trip end-to-end against the
live MxDataProvider:

   Int32   (type_id 4)  — TestChildObject.TestInt = 99
   String  (type_id 10) — TestChildObject.TestString = "mxaccesscli
                            verified 17778523775" (UTF-16LE on wire)
   Bool    (type_id 17) — DelmiaReceiver_001.TestAttribute = 0

A SQL probe of the live Galaxy (`gobject ⨝ package ⨝
dynamic_attribute` grouped by `mx_data_type`) shows only types {1=Bool,
2=Int32, 5=String} have deployed instances. Float/Double/DateTime/
Duration/array shapes are not in this Galaxy, so the remaining four
type-matrix bullets in F32 are gated on Galaxy-side provisioning
that's outside the Rust port's scope. The M5 DoD #3 was always going
to bottom out at "what types are deployed in the test environment."

Code changes:
- `register_items` retry budget bumped: 10 attempts (was 5) with
  `200 * attempt` ms backoff (was 100 * attempt). Worst-case wait
  ~11 s, well within user-perceived latency on a one-shot RPC. The
  .NET reference's 5×100 ms didn't always cover the live AVEVA
  install's auth-state-commit latency on this hardware.
- `AsbClient::connect` adds a 250 ms `tokio::time::sleep` immediately
  after the one-way `AuthenticateMe` send. The server processes the
  request asynchronously; without an initial settle, the per-op retry
  loop frequently exhausts its budget on the InvalidConnectionId
  race even on the FIRST register attempt. 250 ms is short enough to
  be invisible and long enough to absorb the typical commit delay.
- `examples/asb-subscribe.rs` now prints `result_code` and `success`
  alongside the status count so the user can see when register is
  hitting the retry-exhausted state.

Live flakiness note: the AuthenticateMe race is not fully
deterministic — after many back-to-back test runs the live server
appears to degrade (presumably pending-connection table fills) and
the retry budget exhausts on EVERY tag, not just one. A 30-second
cool-down restores reliability. Production deployments with a single
long-lived session are unlikely to hit this. F32 status doc captures
the observation.

Workspace: 711 unit tests pass. Clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:21:07 -04:00
Joseph Doherty 4ddb6542e1 [M5] design: followups update — M5 functionally LIVE, F30/F31 resolved
F18 (M5 master) gains an "M5 STATUS" block right after the DoD
checklist showing the live end-to-end win (commit `9063f10`,
TestChildObject.TestInt round-trips with payload [99,0,0,0]) and
ticking each DoD bullet:
-  Live `asb-subscribe` succeeds.
- ⚠️ Wire request bytes match .NET byte-for-byte; response parity
  uses the F30 dict-id resolution post-pass + chunked-Bytes
  concatenation instead of strict byte equality (functionally
  equivalent — both decode to the same logical XML).
- ⚠️ Type matrix: only Int32 verified live; Bool/Float/Double/
  String/DateTime/Duration/arrays pending sample tags. Tracked
  under new F32.
-  build/test/clippy green (711 tests).

Followup churn:
- F30 + F31 moved to ## Resolved with proper "Resolved: <date>
  (commit `<hash>`)" headers. F30 was the unblocker for F31 —
  without read-side dict-id resolution we couldn't see
  `<resultCodeField>1</>` in the response.
- F28 status header updated to "PARTIALLY RESOLVED": the five
  [XmlSerializerFormat] ops (AuthenticateMe, Disconnect, KeepAlive,
  RegisterItems, UnregisterItems) plus DH params + dynamic-dict
  management all landed; Read/Write/Subscribe/Publish still sign
  over NBFX wire bytes via the legacy fallback. Severity demoted
  P0 → P2 because the live registry has empty `hashAlgorithm` and
  unsigned ops work in practice; promote back if that changes.
- F29 reaffirmed P2 (latent NBFS dict-id drift, no live impact).
- New F32 captures the type-matrix expansion as the only remaining
  P1 item for full M5 closeout.

No code change in this commit — design doc only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:08:36 -04:00
Joseph Doherty 9063f10b1b [M5] mxaccess-asb: register_items retry on InvalidConnectionId — LIVE PATH WORKS
rust / build / test / clippy / fmt (push) Has been cancelled
End-to-end live path now functional: Connect → AuthenticateMe →
RegisterItems → Read → Disconnect. The example reads back the live
TestChildObject.TestInt value (99) over the wire on the first run.

Root-cause of the previous "InvalidConnectionId" mystery: it was
never an HMAC verification failure. `AuthenticateMe` is one-way
(`AsbContracts.cs:18`) and the server commits auth state
asynchronously after the request lands. A Register that follows too
quickly sees the connection in pre-authenticated state and returns
`AsbErrorCode.InvalidConnectionId` (= 1).

.NET's `MxAsbDataClient.RegisterMany` (`cs:191-204`) handles this
explicitly with a retry loop:

  for (int attempt = 1; attempt < 5
       && response.Result.ErrorCode == InvalidConnectionId; attempt++)
  {
      Thread.Sleep(TimeSpan.FromMilliseconds(100 * attempt));
      response = RegisterOnce(items);
  }

We now mirror the same pattern in `AsbClient::register_items_once`
followed by a retry loop in `register_items` — up to 5 attempts with
`100 * attempt` ms backoff.

Supporting changes:
- `RegisterItemsResponse` gains `result_code: Option<u32>` +
  `success: Option<bool>` so callers can read `Result.resultCodeField`
  + `successField` from the response. `decode_register_items_response`
  now tolerates an empty `<ASBIData />` Status array (server returns
  empty when the operation fails server-side) instead of erroring
  with `MissingField`. New helper `find_text_in_named_element` walks
  the body token stream.
- New public constant `RESULT_CODE_INVALID_CONNECTION_ID = 1` for
  callers that want to detect this status outside the retry path.
- The previously-failing test `decode_register_items_response_returns_
  missing_field_when_status_absent` was renamed and rewritten as
  `decode_register_items_response_returns_empty_status_when_absent`
  to match the new tolerant decode contract.

F31 closed. F30 (read-side dict-id resolution, landed in `eb6c689`)
was the unblocker — without it we couldn't see the
`<resultCodeField>1</>` element in the response and the failure mode
looked like a HMAC mismatch instead of a transient retryable error.

Workspace: 711 unit tests pass. Clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:02:38 -04:00
Joseph Doherty eb6c689f09 [M5] mxaccess-asb: F30 read-side dict-id resolution + matching .NET CV xmlns
**F30 (read side):** post-pass over `body_tokens` in `decode_envelope`
substitutes `NbfxName::Static(id)` → `NbfxName::Inline(name)` and
`NbfxText::DictionaryStatic(id)` → `NbfxText::Chars(name)` whenever
the dict id resolves. Lookup tries the per-message binary header
strings first (`(id-1)/2` slot), then falls back to the cumulative
session dynamic dict, then the `[MC-NBFS]` static table for even
ids. Tokens with unresolvable ids stay opaque so trace output still
reveals them.

This unblocks reading the live Register response: previously every
field came back as `<b:Static(43)>false</…>` and we couldn't tell
what the server actually said. Now we see `<b:successField>false</>`
and `<b:resultCodeField>1</>` clearly. resultCode 1 maps to
`AsbErrorCode.InvalidConnectionId` (`AsbResultMapping.cs:6`) —
which means AuthenticateMe failed silently and the server discarded
our connection state, even though the crypto stack is proven
byte-equal to .NET.

**Wire CV xmlns parity:** `<h:ConnectionValidator>` for the
`XmlSerializer` mode (AuthenticateMe / Disconnect / KeepAlive) now
emits all four xmlns declarations .NET writes, in the same order:
`xmlns:h`, default `xmlns` (same value), `xmlns:xsi`, `xmlns:xsd`.
.NET emits the default xmlns redundantly even though the `h` prefix
is bound to the same URL — captured against the .NET probe via
asb-relay. This was suspected to be the AuthenticateMe HMAC blocker
but the live test still returns `InvalidConnectionId`, so the bug
is elsewhere.

**F31 updated** with the surviving hypotheses for the
`InvalidConnectionId` mystery: server-side `XmlSerializer`
constructor mismatch, subtle byte-level wire difference affecting
deserialization, or unused `ServiceAuthenticationData` from the
ConnectResponse. Resolution probably requires server-side
instrumentation or controlled-scenario byte-level HMAC diff.

Workspace: 710 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:47:50 -04:00
Joseph Doherty 703c540bdc [M5] mxaccess-asb: MX_ASB_TRACE_REPLY trace + F30/F31 followups
Adds env-gated diagnostic trace `MX_ASB_TRACE_REPLY` that, on every
incoming SizedEnvelope, prints the raw reply bytes + decoded body
tokens (capped at 64) before any consumer-level decode runs. Used to
isolate the next blocker after F28's wire-format fixes landed: with
canonical XML signing, registry-driven DH params, dynamic-dict id
management, ConnectionValidator wire-format-per-action, chunked
ASBIData decode, and 0x0A `ShortDictionaryXmlnsAttribute` all in
place, AuthenticateMe is accepted by the server and a real
RegisterItemsResponse comes back — but it decodes to an opaque token
stream of `<b:Static(43)>false</b:Static(43)>` etc. because every
field name is dict-encoded against the response's own binary header
pre-pop and we never resolve those ids on the read side.

Two new follow-ups capture the remaining work:
- **F30**: resolve dict-id element/attribute names on the read side.
  Mirror the F28 write-side fix: read-side dynamic dict accumulates
  session strings via `intern`, and `decode_tokens` (or a post-pass)
  needs to substitute `NbfxName::Static(id)` with the resolved
  `NbfxName::Inline(name)` so downstream `find_element_named` /
  `collect_asbidata_payloads` match.
- **F31**: server response indicates `successField=false` with an
  empty Status array on Register. Hypotheses (in order): (a) silent
  HMAC mismatch despite F23 deterministic parity; (b) request-side
  wire-byte delta the server tolerates but interprets as 0 items;
  (c) tag does not resolve in the live Galaxy state. Resolution
  needs F30 first to read the actual Status array + error codes.

Workspace: 710 unit tests pass. Clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:35:29 -04:00
Joseph Doherty cf97eab396 [M5] mxaccess-asb: collect_asbidata_payloads concatenates chunked Bytes records
.NET's `XmlBinaryWriter.WriteBase64` chunks the byte array into
multiple consecutive NBFX `Bytes8/16/32` records when the total
exceeds the per-record budget. Live capture of a successful .NET
RegisterItemsResponse showed the Status ASBIData payload split into
`Bytes8(78) + Bytes8WithEndElement(1)` — total 79 bytes. Our decoder
walked tokens looking for a single `Text(Bytes(...))` after each
`<ASBIData>` element and stopped at the first chunk, returning a
truncated payload that hit `Codec(ShortRead)` when the consumer
tried to decode an ItemStatus from the partial bytes.

Fix: walk **all** consecutive `Text(Bytes)` tokens after `<ASBIData>`
and concatenate into a single payload before pushing to the result
vector. Mirrors WCF's reader behaviour, which reassembles the
chunks into one byte array via `XmlReader.ReadElementContentAsBase64`.

Workspace: 710 unit tests pass. Live state: AuthenticateMe is
accepted, RegisterItemsResponse decodes structurally — the remaining
"MissingField Status" error reflects a server-side semantic outcome
(server returned empty Status array) rather than a protocol bug,
likely tag-resolution related and outside F28's scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 19:36:38 -04:00
Joseph Doherty 104efc4e9b [M5] mxaccess-asb: F28 wire-format fixes — AuthenticateMe accepted live
Three wire-level bugs surfaced by side-by-side relay capture against
the .NET probe routed via the new --via flag:

1. **Dynamic-dictionary id drift**. Our `encode_envelope` hardcoded
   action_dict_id=1 / to_dict_id=3, which is correct for the FIRST
   message in a session but wrong for every subsequent one. The
   per-session dynamic dict accumulates across messages: Connect's
   binary header pre-pops [action,to] at ids 1,3; AuthenticateMe must
   reference the new action at id 5 (continuing the odd sequence) and
   the To URL at id 3 (still in the dict from Connect). Fix uses
   `DynamicDictionary::position_of` + `intern` to compute the right
   wire id, only pre-popping strings that are NEW to the session.
   Captured against .NET probe via asb-relay: AuthenticateMe binary
   header has only one string (action) at offset 0x260 (`06 de 08 2f
   2e ...`), and `<a:Action>` value `ab 05` references the new id 5.

2. **ConnectionValidator wire format depends on operation**. .NET's
   `IAsbDataV2` declares `[XmlSerializerFormat]` on AuthenticateMe,
   Disconnect, KeepAlive (one-way ops) — those use XmlSerializer for
   the ENTIRE message including the [MessageHeader] ConnectionValid-
   ator. Other ops use the default DataContractSerializer. The wire
   shapes differ:
     XmlSerializer: `<ConnectionId xmlns="http://asb.contracts.data/
       20111111">guid</ConnectionId>` (PascalCase property name in
       data namespace)
     DataContract: `<connectionIdField xmlns="http://schemas.data
       contract.org/2004/07/ArchestrAServices.ASBContract">guid</…>`
       (private "fooField" name in datacontract namespace)
   New `ValidatorWireFormat::for_action` picks the right shape per
   action; `encode_validator` now branches on it. New helpers
   `push_xml_text_field` / `push_xml_byte_array_field` for the
   XmlSerializer form. The DataContract form is preserved verbatim
   for Register/Read/Write/etc.

3. **Decoder missing 0x0A** (`ShortDictionaryXmlnsAttribute`). The
   server's RegisterItemsResponse uses `0x0A {dict-id}` to set the
   default namespace from the static dict; our decoder bailed out
   with `UnknownRecord(10)`. New decode arm produces a
   `DefaultNamespace` token with `DictionaryStatic` value.

**.NET probe gains a `--via` flag** (`AsbConnectionOptions.Via` →
`ChannelFactory.CreateChannel(addr, viaUri)`) so the probe can be
routed through asb-relay for byte-level capture without triggering
an `AddressFilterMismatch` fault. CoreWCF / .NET 10 dropped
`ClientViaBehavior`; the `CreateChannel(addr, via)` overload is the
modern equivalent.

Live status (this commit): Connect handshake works, AuthenticateMe
no longer faults (canonical XML + crypto + wire-format all match
.NET now), RegisterItemsResponse comes back from the server (a real
response, not a dispatcher fault). One remaining issue: our response
decoder hits `MissingField { field: "Status" }` — the server's
RegisterItemsResponse uses a slightly different element naming or
encoding than `collect_asbidata_payloads` expects. Next iteration
hunts that.

Workspace: 710 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 19:29:48 -04:00
Joseph Doherty ce27b63010 [M5] auth: deterministic HMAC fixture test rules out crypto stack
Adds end-to-end byte-equality test against a `.NET reference fixture
captured via the new `MxAsbClient.Probe --dump-deterministic-hmac`
flag. All inputs are pinned (passphrase, prime, generator, private-
key bytes, remote-pub bytes, message number, connection ID, AES IV,
consumer-data + IV bytes), so the test reproduces .NET's exact
output for every crypto step:

1. shared = remote_pub^private_key mod prime —  matches
2. crypto_key = shared || passphrase_utf8 —  matches
3. hmac = HMAC-SHA1(crypto_key, xml_utf8) —  matches
4. aes_key = PBKDF2-SHA1(base64(crypto_key), salt, 1000, 16) — 
5. encrypted_mac = AES-CBC(aes_key, iv=zeros, hmac, PKCS7) — 

This conclusively rules out the entire crypto stack as the source of
the live AuthenticateMe `dispatcher/fault`. Our DH math, HMAC engine,
PBKDF2 derivation, AES-CBC PKCS7, and crypto_key concatenation are
byte-equal to .NET. The remaining live failure must come from one
of: (a) wire-level ConnectionValidator NBFX shape (DataContract field
names, mustUnderstand attribute, namespace), (b) WCF binary message
header (action+to dict pre-pop), or (c) a subtle XmlSerializer quirk
for live values that the hardcoded fixtures don't exercise (Guid
format edge case, base64 line wrapping, ulong text rendering).

Fixture lives at `crates/mxaccess-asb-nettcp/tests/fixtures/
deterministic-hmac/authenticate-me.kv` (KV format, ASCII hex, lines
trim CRLF/LF transparently). The companion `README.md` documents the
capture procedure and the per-step decomposition. The test consumes
the .NET-supplied canonical XML directly from the fixture's
`xml_utf8_b64` so a Rust XML emitter bug would not mask a Rust
crypto bug — XML byte-equality is verified separately by
`mxaccess-asb::xml_canonical::tests` against the `signed-xml/*.xml`
fixtures.

Workspace: 710 unit tests pass (was 709 + 1 new). Clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 19:12:17 -04:00
Joseph Doherty 42ac10a88f [M5] design: F28 follow-up update with progress + remaining blocker
Updates F28 with:
- Captured-fixtures section now lists all 6 shapes (added the
  empty-MAC variant) and 10 inferred XmlSerializer rules including
  the empty-byte-array → self-closing-element rule we discovered.
- New "Emitter landed" section pointing at commit `f14580e` and the
  five exposed `emit_*` functions, plus the
  `AsbAuthenticator::peek_next_message_number` plumbing.
- New "Registry-driven DH params" section explaining why
  `CryptoParameters::defaults()` was insufficient for live testing
  (live AVEVA installs use 768-bit primes; default is 1024-bit) and
  documenting the new MX_ASB_DH_* env-var contract.
- New "Remaining live blocker" section documenting that AuthenticateMe
  still faults despite canonical XML byte-equality and registry-correct
  DH params — most likely a byte-level HMAC/AES discrepancy that needs
  a deterministic-input unit-test triple to pin down.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 17:38:23 -04:00
Joseph Doherty fd38189f43 [M5] auth+probe: env-gated crypto-key/AES-key trace for F28 follow-up
Adds diagnostic traces in both the Rust authenticator and the .NET
reference (under MX_ASB_TRACE_DERIVE / sharedTrace) that dump:
- crypto_key length + hex + base64 (shared || passphrase)
- derived AES key hex (PBKDF2-SHA1, 16 bytes)

Used to confirm during the F28 live-bring-up reconciliation that:
1. crypto_key passphrase suffix bytes [96..176] match between Rust and
   .NET — both read the same registry passphrase, both UTF-8-encode.
2. crypto_key shared_secret prefix bytes [0..96] DIFFER per run because
   each session has its own random DH private exponent. This is
   expected; what matters is the client+server agreement on the value
   for a single session, which the wire-tested DH math should produce
   given correct prime/generator/private-key handling.

Both traces are gated:
- Rust: `MX_ASB_TRACE_DERIVE=1` env var.
- .NET: `Action<string>? sharedTrace` field, populated when the
  authenticator is constructed with a non-null trace callback (the
  probe's `Console.WriteLine` shim wires this up by default).

Workspace: 709 tests still pass. No public-API changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 17:37:22 -04:00
Joseph Doherty f14580e0db [M5] mxaccess-asb: F28 canonical-XML signing wired + registry-driven DH params
Adds `xml_canonical` module that emits XmlSerializer-compatible canonical
XML for the five primary `ConnectedRequest` shapes (AuthenticateMe,
Disconnect, KeepAlive, RegisterItemsRequest, UnregisterItemsRequest).
Six fixture-comparison tests verify byte-exact match against captured
.NET output, including the empty-MAC-IV variant that the live signing
flow uses (`authenticate-me-empty-mac-iv.xml`, 896 bytes; new
`emit_data_ns_byte_array` helper picks self-closing form for empty
byte[]).

Plumbing: `AsbAuthenticator::peek_next_message_number` exposes the
pre-allocated message number; `AsbClient::send_signed_envelope[_one_way]`
gain an `xml_for_signing: Option<&[u8]>` parameter. `connect`,
`disconnect`, `keep_alive`, `register_items`, `unregister_items` now
build a pre-signing `ConnectionValidator` (empty MAC + IV) + emit the
canonical XML + pass the bytes through to HMAC. Other ops (Read, Write,
Subscription) keep the legacy NBFX-bytes path until F28 expands to
cover their request shapes.

Live-bring-up wiring:
- `tools/Get-AsbPassphrase.ps1` now exports `MX_ASB_DH_PRIME`,
  `MX_ASB_DH_GENERATOR`, `MX_ASB_DH_HASH_ALGORITHM` (always — even when
  empty, so the example can distinguish "no env var" from "registry
  says empty"), and `MX_ASB_DH_KEY_SIZE`.
- `examples/asb-subscribe.rs` honours those env vars to override
  `CryptoParameters::defaults()`. Each AVEVA install picks its own DH
  group at provisioning time (768-bit prime is typical, vs the .NET
  reference's 1024-bit fallback that we previously hardcoded). Empty
  hashAlgorithm in the registry maps to `HashAlgorithm::Unrecognised`,
  matching `AsbSystemAuthenticator.CreateHmac:84-93` semantics where
  empty + forceHmac=true → HMAC-SHA1.
- `MxAsbClient.Probe --dump-signed-xml` flag (added in earlier commit)
  now traces the live HMAC inputs (`asb.sign.xml-utf8-len`,
  `asb.sign.xml-b64`, `asb.sign.hmac-b64`, etc.) so the Rust port can
  diff its canonical XML against .NET's byte-for-byte for any live
  scenario (env-driven via `Action<string>? sharedTrace`).

Wire-format alignment for `XmlSerializer` parity:
- `ItemIdentity::default()` and `absolute_by_name` now use
  `Some(String::new())` for null-able strings (matches .NET's
  `CreateAbsoluteItem` setting `ContextName = string.Empty` not null).
- `read_unicode_string` returns `Some(String::new())` for length-0
  rather than `None` — mirrors .NET's `AsbBinary.ReadUnicodeString:
  return string.Empty for byteLength == 0`. Wire format genuinely
  cannot distinguish null from empty (both encode as 4 bytes of zero);
  callers that need to preserve the distinction MUST track it in their
  domain types before encoding.

Live status (post-fix): Connect handshake completes end-to-end. The
canonical XML our emitter produces matches .NET's structure byte-for-
byte (verified by fixture comparison). DH prime/generator/hash now
match the live registry values. Despite all this, AuthenticateMe
still produces a generic dispatcher fault on the server — there's at
least one more subtle wire-byte or crypto mismatch that needs
isolation. F28 stays open with that note.

Workspace: 709 unit tests pass (was 702 + 7 new xml_canonical tests).
Clippy: clean (`-D warnings`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 17:31:31 -04:00
Joseph Doherty dbb580b2c8 [M5] tools+fixtures: F28 canonical-XML signing target captured from .NET
Adds `MxAsbClient.Probe --dump-signed-xml` flag that builds five
ConnectedRequest shapes (AuthenticateMe, Disconnect, KeepAlive,
RegisterItemsRequest, UnregisterItemsRequest) with deterministic
field values and prints `AsbSerialization.ToXml(...)` output. The
output is exactly what `AsbSystemAuthenticator.Sign` HMACs
(`AsbSystemAuthenticator.cs:79`), so the Rust port's canonical-XML
emitter must produce byte-identical bytes for HMAC parity.

Captured fixtures land under
`rust/crates/mxaccess-asb/tests/fixtures/signed-xml/`:
- `authenticate-me.xml` — 1000 bytes
- `disconnect.xml` — 980 bytes
- `keep-alive.xml` — 705 bytes
- `register-items.xml` — 1068 bytes
- `unregister-items.xml` — 1072 bytes

Plus a `README.md` documenting 10 inferred XmlSerializer rules
(element name = class name not WrapperName, field order =
declaration order not [MessageBodyMember.Order], `[XmlType.Namespace]`
on field type causes per-child xmlns redeclaration on the children
not the wrapper, `*Specified` pattern controls Xxx emission, CRLF +
2-space indent + utf-16 declaration but UTF-8 bytes fed to HMAC).

`.gitattributes` marks the XML fixtures as binary (`*.xml -text`)
so neither `core.autocrlf` nor `text` filters can rewrite the byte
content — CRLF is part of the canonical form and must survive
round-trip through Git untouched.

`MxAsbClient.csproj` gains `<InternalsVisibleTo Include="MxAsbClient
.Probe" />` so the probe can reach the internal `AsbSerialization`
helper without making it public.

Workspace: 702 tests pass (no Rust changes — fixtures only).
F28 follow-up updated with the captured fixtures + the inferred rules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 16:35:45 -04:00
Joseph Doherty d1e887b91b [M5] mxaccess-asb-nettcp/asb: Connect handshake live + SOAP fault detection
Live-bring-up reconciliation against AVEVA's MxDataProvider on Windows.
Connect now completes end-to-end (real DH key exchange, apollo:V2
encryption, ServicePublicKey/ServiceAuthenticationData populated). Five
fixes land:

1. NBFX `PrefixElement_a..z` (0x5E-0x77) and `PrefixAttribute_a..z`
   (0x26-0x3F) decode + encode arms. The server's ConnectResponse hit
   `0x65 = PrefixElement_h` for a dynamically-named element and our
   decoder bailed with `unknown NBFX record byte 0x65`. Both directions
   now round-trip; encoder picks short-form when prefix is a single
   lowercase ASCII letter.

2. xmlns redeclaration on `<Data>` AND `<InitializationVector>` inside
   `AuthenticationData` / `PublicKey`. `[XmlType(Namespace = ...)]` on
   AuthenticationData / PublicKey (`AsbContracts.cs:350-381`) means
   XmlSerializer emits `xmlns="..."` on each direct child. The default-
   ns scope ends at `</Data>`, so `<InitializationVector>` needs its own
   redeclaration to stay in the data namespace; without it the server
   fell back to messages-namespace and the deserialiser threw an
   `InternalServiceFault`.

3. SOAP-fault detection in `AsbClient::send_envelope`. New
   `ClientError::SoapFault { action, code, reason }` surfaces when the
   response Action header matches the canonical `dispatcher/fault`
   template; previously body decoders blindly ran and surfaced
   `MissingField { field: "Status" }` masking the actual fault. Reason
   text is extracted as the longest `NbfxText::Chars` in the body —
   robust against the `nbfs.rs` static-dictionary id mismatches.

4. Identified blocker (filed as F28): signed-request HMAC currently
   covers the NBFX wire bytes, but .NET's `AsbSystemAuthenticator.Sign`
   HMACs `Encoding.UTF8.GetBytes(request.ToXml())` — the canonical XML
   serialisation via `XmlSerializer` with namespace
   `urn:invensys.schemas` (`AsbSerialization.cs:12-48`). Until the Rust
   port emits identical XML bytes for `ConnectedRequest` subclasses,
   AuthenticateMe / RegisterItems / every signed RPC fault on the
   server. Connect itself is unsigned (`ServiceMessage` not
   `ConnectedRequest`) which is why it works today.

5. Identified `nbfs.rs` static-dictionary id drift (filed as F29): wire
   uses Fault=134/Code=142/Reason=144/Text=146/Value=154/Subcode=156
   but our table has them at 114/122/124/126/134/136. Off by 20 from
   id 114+ — 10 missing entries between `s` (id 112) and `Fault`. No
   request-side impact (we only encode IDs ≤44, all correct); the SOAP
   fault decode walks text records directly so it sidesteps the issue.

Workspace: 702 tests pass (no test count delta — wire-only fixes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 16:29:12 -04:00