Files
mxaccess/docs/Managed-LMX-NMX-Capture-Plan.md
Joseph Doherty fe2a6db786
rust / build / test / clippy / fmt (push) Has been cancelled
Initial project state: .NET reference, design, Rust port (M0+M1), evidence
Layout:
- src/                    .NET 10 x64 reference: MxNativeCodec, MxNativeClient,
                          MxAsbClient, probes, tests, harnesses. Executable spec.
- design/                 Architectural plan for the Rust port (M0–M6), error
                          model, protocol invariants, risks (R1–R16), adversarial
                          review log (review.md).
- rust/                   Rust workspace. M0 skeleton + M1 codec parity.
                          mxaccess-codec: 215 unit tests + 2 cross-implementation
                          parity tests (byte-identical against .NET reference).
                          Other crates are M0 stubs awaiting M2+.
- captures/               Frida + netsh + pcap evidence per CLAUDE.md
                          ("captures are evidence, not throwaway logs").
- analysis/               Decompiled C# (frida/proxy/decompiled-*),
                          Ghidra exports for native DLLs (`exports/` only —
                          working state at `projects/` and AVEVA's input
                          binaries at `input/` are gitignored).
- docs/                   Reverse-engineering reference docs.
- tools/                  Setup-LiveProbeEnv.ps1 (Infisical credential fetcher),
                          Compute-Crc.ps1 (.NET parity helper).
- .github/workflows/      Rust CI: fmt + build + test + clippy on Windows.
- LICENSE                 MIT (Joseph Doherty, 2026).

Verified:
- cargo test --workspace → 217 passed (215 unit + 2 .NET parity), 0 failed
- cargo clippy --workspace -- -D warnings → clean
- cargo fmt --all -- --check → clean
- cargo publish --dry-run -p mxaccess-codec → packages cleanly

Excluded from history (see .gitignore):
- **/bin, **/obj, **/target — build artifacts
- analysis/ghidra/projects/ — Ghidra working state (regenerable)
- analysis/ghidra/input/ — AVEVA proprietary DLLs (vendor IP)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:21:00 -04:00

296 lines
9.1 KiB
Markdown

# Managed LMX/NMX capture plan
Goal: build a full managed .NET 10 x64 DLL that talks to AVEVA/Wonderware
LMX/NMX without loading the x86 MXAccess COM/native stack.
Assumption: licensing is not a blocker for this environment. The capture plan
still records licensing and security behavior because the managed client must
match runtime behavior for secured writes, permission failures, and audit paths.
## Success criteria
The managed DLL is viable when it can perform these operations from an x64
.NET 10 process using only managed code:
- discover or configure the local Galaxy/runtime endpoint,
- resolve a Galaxy attribute reference such as `Object.Attribute`,
- read current value, timestamp, quality, and status,
- subscribe and receive initial value plus subsequent data changes,
- write standard values and receive definitive completion status,
- preserve bad-quality/status semantics when a platform or engine stops,
- handle reconnect, heartbeat, runtime restart, and deploy/undeploy churn,
- map common MXAccess errors to deterministic managed status codes.
## Known protocol gaps
Static analysis gave us interfaces and routing, but not enough wire/message
detail. These are the missing pieces to capture:
| Area | Missing detail |
| --- | --- |
| NMX session startup | version negotiation, anonymous engine allocation, heartbeat setup, local/remote ids |
| Endpoint discovery | how local Galaxy/platform/engine ids are chosen or discovered |
| Item resolution | request body for `AddItem`, `AddItem2`, invalid-reference response |
| Subscription | request body for `Advise`, `AdviseSupervisory`, buffered items, initial value behavior |
| Data update | payload format for value, type, quality, timestamp, `MXSTATUS_PROXY` |
| Reads | whether MXAccess implements read as transient subscription only or lower `IDataClient.Read2` messages can be used directly |
| Writes | standard write body, timestamped write, array write, status and completion correlation |
| Security | secured write, verified write, user id/GUID mapping, denied-write status |
| Lifecycle | unregister/unadvise/remove cleanup messages |
| Failure modes | platform down, engine down, bad reference, repository busy, wrong type, permission denied |
## Phase 1: controlled MXAccess harness
Build a tiny x86 harness that uses the primary MXAccess stack exactly as
production does:
```text
ArchestrA.MXAccess.dll -> LMXProxy.LMXProxyServer -> LmxProxy.dll
```
Harness requirements:
- target `net48`, `x86`,
- single STA thread,
- deterministic client name, for example `MxProtoTraceHarness`,
- configurable tag list and write values,
- timestamps every high-level API call,
- logs:
- `Register` handle,
- every `AddItem` item handle,
- every `AdviseSupervisory`,
- every `OnDataChange`,
- every `Write`,
- every `OnWriteComplete`,
- every status array,
- cleanup calls.
Initial scenarios:
1. Register/unregister only.
2. Add/remove one known-good read-only item.
3. Subscribe/unsubscribe one known-good item.
4. One-shot read pattern: add, advise, wait first callback, unadvise, remove.
5. Write one boolean or integer setpoint and capture completion.
6. Add invalid item reference.
7. Subscribe to a stopped/unavailable runtime host probe if available.
Artifacts:
- harness source,
- timestamped harness log,
- Process Monitor trace,
- network trace,
- optional API Monitor trace.
Exit criteria:
- one repeatable capture set where every high-level harness operation can be
correlated to lower process/network activity.
## Phase 2: process and network capture
Capture layers:
1. Process Monitor:
- processes: harness, `NmxSvc.exe`, `aaBootstrap`, relevant AppEngine/Platform processes,
- operations: registry, file, process/thread, TCP/UDP where available,
- objective: confirm DLLs, config, registry keys, and runtime dependencies touched per operation.
2. Network:
- filter: host IP plus TCP/UDP port `5026`,
- tools: Wireshark and/or `netsh trace`,
- objective: recover packet framing and message timing.
3. COM/API call tracing:
- target `LmxProxy.dll`, `Lmx.dll`, `NmxAdptr.dll`, `NmxSvc.exe`,
- APIs of interest:
- COM object creation,
- `NmxSvc.NmxService.TransferData`,
- `NmxAdptr.INmx4.PutRequest2`,
- `NmxAdptr.INmx4.GetResponse2`,
- `IDataClient.*` if used,
- Winsock `sendto`, `recvfrom`, TCP send/recv.
Capture naming:
```text
captures/
001-register/
002-additem-good/
003-subscribe-good/
004-write-good/
005-additem-invalid/
```
Each folder should contain:
```text
harness.log
procmon.pml
network.pcapng
notes.md
```
Exit criteria:
- identify which layer carries the actual item-resolution, subscribe, update,
and write messages.
- know whether the best implementation target is:
- direct NMX socket protocol,
- local COM `NmxSvc` automation,
- lower `Lmx.dll` `IDataClient` behavior replicated over sockets.
## Phase 3: schema reconstruction
Start with messages that have simple cause/effect:
1. `Register` / startup heartbeat.
2. `AddItem` good versus invalid reference.
3. `AdviseSupervisory` initial value.
4. `Write` and `OnWriteComplete`.
For each message:
- identify envelope:
- magic/version,
- message type,
- source galaxy/platform/engine id,
- destination galaxy/platform/engine id,
- request handle/correlation id,
- payload length,
- checksum if any.
- identify payload:
- item reference string encoding,
- context string,
- item handle/id,
- Mx data type,
- quality,
- FILETIME timestamp,
- status category/source/detail.
Use differential captures:
- same operation, different tag name,
- same tag, different value,
- valid tag versus invalid tag,
- boolean versus integer versus string,
- normal write versus denied write.
Exit criteria:
- documented binary structs for the minimum viable read/subscribe/write path,
- sample parser that can decode captured traffic into structured JSON.
## Phase 4: managed transport prototype
Build a new .NET 10 x64 prototype library:
```text
src/ManagedLmxNmx/
LmxNmxClient.cs
NmxTransport.cs
NmxFrame.cs
LmxMessages.cs
MxValueCodec.cs
MxStatus.cs
```
Prototype scope:
- connect to local NMX endpoint,
- perform startup/version/heartbeat,
- resolve one item,
- subscribe to one item,
- decode updates,
- write one item,
- disconnect cleanly.
Hard rules:
- no COM,
- no native P/Invoke except optional OS primitives already wrapped by .NET,
- no x86 process dependency,
- deterministic timeouts and cancellation,
- all binary parsers bounds-checked.
Exit criteria:
- x64 test app receives the same value/quality/timestamp as MXAccess for a known tag.
- x64 test app can write and observe `OnWriteComplete` equivalent status.
## Phase 5: parity suite
Create side-by-side tests:
```text
MXAccess x86 harness result == Managed x64 result
```
Parity matrix:
| Scenario | Expected |
| --- | --- |
| Good boolean read | same value, quality, timestamp within tolerance |
| Good numeric read | same value/type/quality |
| Good string read | same value/type/quality |
| Invalid reference | same category/source/detail |
| Subscribe initial callback | same initial value behavior |
| Subscribe change callback | same change behavior |
| Write allowed | same completion success |
| Write wrong type | same failure detail |
| Write denied | same security failure detail |
| Platform stopped | same bad quality/status behavior |
| Engine restart | reconnect and resubscribe |
| Deploy/undeploy busy | same busy status |
Exit criteria:
- managed client passes parity for the minimum production tag classes,
- documented unsupported cases are explicit.
## Phase 6: production hardening
Before replacing the sidecar:
- fuzz message parsers with captured and mutated frames,
- soak-test subscriptions with production-scale tag counts,
- run AppEngine/Platform stop-start tests,
- verify no unbounded queues,
- verify reconnect backoff,
- verify audit/security behavior for writes,
- expose metrics:
- connected state,
- heartbeat age,
- subscription count,
- pending request count,
- reconnect count,
- bad frame count.
## Immediate next work items
Completed in the 2026-04-25 run:
1. Created the x86 MXAccess trace harness at `src\MxTraceHarness`.
2. Queried the Galaxy repository for safe candidate tags using
`analysis\sql\select_capture_tags.sql`.
3. Captured register, add/remove, subscribe, invalid reference, array naming,
and advised write scenarios under `captures\`.
4. Converted `netsh.etl` traces to `network.pcapng` with `etl2pcapng`.
5. Installed Wireshark 4.6.4 and attempted Npcap 1.87 installation.
Current blocker:
- The local NMX payload is on a loopback path (`::1`) that `netsh trace` and
`pktmon` did not expose as pcap payload in this session. `dumpcap` cannot
capture until Npcap installs successfully.
Next work items:
1. Complete an elevated/interactive Npcap install or use API Monitor.
2. Recapture scalar subscribe, array subscribe, invalid subscribe, and advised
write on the NMX loopback path.
3. Write the first decoder for loopback NMX frame boundaries and timestamps.
4. Decide whether direct socket protocol or lower `IDataClient` behavior is the
better implementation target after loopback payloads are available.