Improve XML documentation coverage across src modules and sync generated analysis artifacts.

This commit is contained in:
Joseph Doherty
2026-03-14 03:56:58 -04:00
parent ba0d65317a
commit 46ead5ea9f
152 changed files with 2821 additions and 11284 deletions

View File

@@ -0,0 +1,75 @@
# dotTrace DTP Parser Design
**Goal:** Build a repository-local tool that starts from a raw dotTrace `.dtp` snapshot family and emits machine-readable JSON call-tree data suitable for LLM-driven hotspot analysis.
**Context**
The target snapshot format is JetBrains dotTrace multi-file storage:
- `snapshot.dtp` is the index/manifest.
- `snapshot.dtp.0000`, `.0001`, and related files hold the storage sections.
- `snapshot.dtp.States` holds UI state and is not sufficient for call-tree analysis.
The internal binary layout is not publicly specified. A direct handwritten decoder would be brittle and expensive to maintain. The machine already has dotTrace installed, and the shipped JetBrains assemblies expose snapshot storage, metadata, and performance call-tree readers. The design therefore uses dotTraces local runtime libraries as the authoritative decoder while still starting from the raw `.dtp` files.
**Architecture**
Two layers:
1. A small .NET helper opens the raw snapshot, reads the performance DFS call-tree and node payload sections, resolves function names through the profiler metadata section, and emits JSON.
2. A Python CLI is the user-facing entrypoint. It validates input, builds or reuses the helper, runs it, and writes JSON to stdout or a file.
This keeps the user workflow Python-first while using the only reliable decoder available for the undocumented snapshot format.
**Output schema**
The JSON should support both direct consumption and downstream summarization:
- `snapshot`: source path, thread count, node count, payload type.
- `thread_roots`: thread root metadata.
- `call_tree`: synthetic root with recursive children.
- `hotspots`: flat top lists for inclusive and exclusive time.
Each node should include:
- `id`: stable offset-based identifier.
- `name`: resolved method or synthetic node name.
- `kind`: `root`, `thread`, `method`, or `special`.
- `inclusive_time`
- `exclusive_time`
- `call_count`
- `thread_name` when relevant
- `children`
**Resolution strategy**
Method names are resolved from the snapshots metadata section:
- Use the snapshots FUID-to-metadata converter.
- Map `FunctionUID` to `FunctionId`.
- Resolve `MetadataId`.
- Read function and class data with `MetadataSectionHelpers`.
Synthetic and special frames fall back to explicit labels instead of opaque numeric values where possible.
**Error handling**
The tool should fail loudly for the cases that matter:
- Missing dotTrace assemblies.
- Unsupported snapshot layout.
- Missing metadata sections.
- Helper build or execution failure.
Errors should name the failing stage so the Python wrapper can surface actionable messages.
**Testing**
Use the checked-in sample snapshot at `snapshots/js-ordered-consume.dtp` for an end-to-end test:
- JSON parses successfully.
- The root contains thread children.
- Hotspot lists are populated.
- At least one non-special method name is resolved.
This is enough to verify the extraction path without freezing the entire output.