Improve XML documentation coverage across src modules and sync generated analysis artifacts.

2026-03-14 03:56:58 -04:00
parent ba0d65317a
commit 46ead5ea9f
152 changed files with 2821 additions and 11284 deletions
--- a/docs/plans/2026-03-14-dtp-parser-design.md
+++ b/docs/plans/2026-03-14-dtp-parser-design.md
@@ -0,0 +1,75 @@
+# dotTrace DTP Parser Design
+
+**Goal:** Build a repository-local tool that starts from a raw dotTrace `.dtp` snapshot family and emits machine-readable JSON call-tree data suitable for LLM-driven hotspot analysis.
+
+**Context**
+
+The target snapshot format is JetBrains dotTrace multi-file storage:
+
+- `snapshot.dtp` is the index/manifest.
+- `snapshot.dtp.0000`, `.0001`, and related files hold the storage sections.
+- `snapshot.dtp.States` holds UI state and is not sufficient for call-tree analysis.
+
+The internal binary layout is not publicly specified. A direct handwritten decoder would be brittle and expensive to maintain. The machine already has dotTrace installed, and the shipped JetBrains assemblies expose snapshot storage, metadata, and performance call-tree readers. The design therefore uses dotTrace’s local runtime libraries as the authoritative decoder while still starting from the raw `.dtp` files.
+
+**Architecture**
+
+Two layers:
+
+1. A small .NET helper opens the raw snapshot, reads the performance DFS call-tree and node payload sections, resolves function names through the profiler metadata section, and emits JSON.
+2. A Python CLI is the user-facing entrypoint. It validates input, builds or reuses the helper, runs it, and writes JSON to stdout or a file.
+
+This keeps the user workflow Python-first while using the only reliable decoder available for the undocumented snapshot format.
+
+**Output schema**
+
+The JSON should support both direct consumption and downstream summarization:
+
+- `snapshot`: source path, thread count, node count, payload type.
+- `thread_roots`: thread root metadata.
+- `call_tree`: synthetic root with recursive children.
+- `hotspots`: flat top lists for inclusive and exclusive time.
+
+Each node should include:
+
+- `id`: stable offset-based identifier.
+- `name`: resolved method or synthetic node name.
+- `kind`: `root`, `thread`, `method`, or `special`.
+- `inclusive_time`
+- `exclusive_time`
+- `call_count`
+- `thread_name` when relevant
+- `children`
+
+**Resolution strategy**
+
+Method names are resolved from the snapshot’s metadata section:
+
+- Use the snapshot’s FUID-to-metadata converter.
+- Map `FunctionUID` to `FunctionId`.
+- Resolve `MetadataId`.
+- Read function and class data with `MetadataSectionHelpers`.
+
+Synthetic and special frames fall back to explicit labels instead of opaque numeric values where possible.
+
+**Error handling**
+
+The tool should fail loudly for the cases that matter:
+
+- Missing dotTrace assemblies.
+- Unsupported snapshot layout.
+- Missing metadata sections.
+- Helper build or execution failure.
+
+Errors should name the failing stage so the Python wrapper can surface actionable messages.
+
+**Testing**
+
+Use the checked-in sample snapshot at `snapshots/js-ordered-consume.dtp` for an end-to-end test:
+
+- JSON parses successfully.
+- The root contains thread children.
+- Hotspot lists are populated.
+- At least one non-special method name is resolved.
+
+This is enough to verify the extraction path without freezing the entire output.