Phase 2.5 cleanup: 15-item backlog burndown #3
Reference in New Issue
Block a user
Delete Branch "phase-2.5"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Phase 2.5 burns down the 15-item backlog (5 from Phase 1.5 + 10 from Phase 2.5/3) tracked in CLAUDE.md. 8 tasks across 5 waves; both old backlog sections are now empty. New follow-ups discovered during execution are tracked in a fresh "Phase 2.6 / 3 backlog" section.
Note: this PR currently includes Phase 2 (PR #2) since #2 hasn't merged. Once #2 merges, this PR auto-narrows to just Phase 2.5's 27 commits / ~3.4k lines.
What shipped (8 tasks, 5 waves)
open_dbrefactor withcheck_same_threadparameter (eliminates duplicated PRAGMA setup inchat/web/bots.py)bot_resetpurges orphaned "you" activity rows (also fixes a latent FK constraint crash)prompt.pypolish (3 sub-commits)hostvsguestderived from chat membership; fixes guest-as-speaker memory retrieval)ACTIVITIES:block with bullet-level trimedge_knowledge_factprojector branch)memory_witnessprojector branch)chat/services/addressee.pyclassifier service: classifier-based addressee detection, significance for interjection memories, scene-close-on-cancel pinned, stale-guest cleanupArchitecture notes
Test plan
|safeon user content, all SQL parameterized).SignificanceJobwas enqueued for the interjection memoryturn_html_replaceto actually swap live)Phase 2.6 / 3 backlog (discovered during execution, tracked in CLAUDE.md)
turn_html_replaceSSE event (closes Phase 1.5 backlog #2 end-to-end)_witness_role_fordefensive coding forhost_bot_id is NoneLiteral["high","medium","low"]type tightening onAddresseeDecision.confidencePlan
docs/plans/2026-04-26-v2.5-phase2.5-cleanup.md(committed ine05f28e).Idempotent seeder for three sample bots (Maya — coworker slow-burn, Eli — live-in partner, Sam — bartender / new connection). Each is a distinct relational archetype to exercise the system from different angles. Run from repo root: .venv/bin/python scripts/seed_sample_bots.py Re-running skips ids that already exist. After seeding, walk each bot through kickoff parse-and-confirm at /bots/<id>/kickoff.The kickoff parse-and-confirm route was 500-ing intermittently because Hermes-3 + Featherless's response_format={"type":"json_object"} only guarantees JSON output, NOT a particular schema. The model was inventing its own field names (sceneTime, entities, settingDetails) instead of the KickoffParse fields, causing Pydantic validation to fail on both classify() retries. Three changes: 1. Include the Pydantic JSON schema in the system prompt so the model knows exactly which keys to produce. Affects every classify() call (kickoff parse, turn parse, scene-close detect, significance, state-update, scene summarize). Strip ```json fences if the model wraps its output. Bump retries 2 → 3 (model is stochastic; one extra attempt closes most of the remaining gap). 2. parse_kickoff() now passes a default empty KickoffParse so the route degrades to a fillable form instead of 500 when the classifier ultimately fails. The confirm form is the human-in-the-loop; an empty form is strictly better UX than a stack trace. 3. Tests updated: bumped canned-failure arrays from 2 → 3 entries to match the new attempt count; renamed kickoff test from "raises_when_classifier_fails_twice" to "falls_back_to_empty_when_classifier_fails" reflecting the new degraded-but-usable behavior. Verified live with all 3 sample bots (maya/eli/sam) — kickoff route returns 200 across multiple attempts. Full suite: 168 passed.13 tasks across 6 waves (1, 2, 3, 4a, 4b, 5). Designed for parallel subagent execution where file-disjointness allows. Waves 1, 2, 4a, and 5 each contain 2-3 tasks that touch disjoint files and can be dispatched concurrently via the Agent tool with isolation: "worktree". Waves 3 (drawer guest support) and 4b (multi- entity turn flow) are single-task because they touch hot files (_drawer.html, turns.py) that cannot be safely co-modified. Plan covers: - T36: group_node schema + handlers (new migration 0008) - T37: guest_added / guest_removed event handlers (modifies world.py) - T38: relationship-seed service ("have they met?") - T39: interjection classifier service - T40: multi-entity state-update coordinator (6 directed pairs) - T41: multi-witness memory write helper - T42: drawer guest add/remove UI + render - T43: multi-entity prompt assembly (extends T18) - T44: multi-entity turn flow (rewrites post_turn) - T45: multi-entity per-POV summaries on scene close - T46: witness filter cross-coverage tests - T47: bot_reset cascades to guest references - T48: Phase 2 documentation update Plan also documents: - Worktree-per-subagent dispatch pattern using Agent isolation flag - Merge ordering per wave (file-disjointness = conflict-free merges) - Failure recovery (cancel failed parallel task, re-dispatch as solo) - Conflict prevention checklist (verify Files sections disjoint per wave) Tasks file (.tasks.json) carries dependency DAG with `blockedBy` and `parallelGroup` so a future executing-plans run can dispatch correctly. NOT EXECUTING. Plan only.T18 review (Phase 1) noted the NICE-tier trim drops previous-scene FIRST while §6.3 spec lists previous-scene LAST in the NICE tier group. Decision: keep the existing greedy order (previous-scene first), and document why. Rationale (now in code at the trim ladder): 1. Cheapest-impact-first — a per-POV previous-scene summary loses less narrative continuity than the older dialogue turns or memory hits it competes with. 2. Greedy lookahead is more expensive than the marginal narrative loss. Dropping previous-scene typically clears the soft-budget slack in one step. Test added: test_nice_trim_order_documented pins the observed order (previous-scene -> memories -> dialogue) so a future refactor can't silently invert it. Sized so that all-NICE config overflows soft but dropping just previous-scene fits — proves memories and older dialogue turns survive while previous-scene is the FIRST drop.Memories grow per-flag witness checkboxes (you / host / guest) that auto-submit on change via HTMX. The new POST route emits a manual_edit event with target_kind=memory_witness and a {flag, value} payload; prior_value mirrors the same shape so an inverse edit restores the flag. The drawer's recent-memories query now selects the three witness columns alongside the existing fields so the template can render checkbox state without a second query per row.