Files
chat/docs/plans/2026-04-26-v3-phase3-implementation.md
T
Joseph Doherty 379054755a docs: add Phase 3 implementation plan with parallel-safe waves
19 tasks across 8 waves covering events with lifecycles, time skips
(elision + jump), active threads, significance/retrieval refinements,
and meanwhile scenes (host+guest with no 'you'). Mirrors the Phase 2
plan structure: pre-flight, parallel-execution strategy with worktree
isolation, file-disjointness analysis per wave, and per-task TDD spec
with commit messages.

Phase 3 schema: adds 0009_events.sql, 0010_threads.sql,
0011_meanwhile_scenes.sql (final version 11). Builds on Phase 2's
3-entity scene support and event-sourced architecture.
2026-04-26 16:55:50 -04:00

892 lines
51 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Roleplay Engine — Phase 3 Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for waves that fan out to multiple subagents.
**Goal:** Add events with lifecycles, time skips (elision + jump), active threads, significance/retrieval refinements, and "Meanwhile…" scenes (host+guest with no "you" present). All scoped to a single chat; the cross-chat surface remains unchanged.
**Architecture:** Builds on Phase 2's event-sourced architecture and 3-entity scene support. New event kinds (`event_planned`, `event_started`, `event_completed`, `event_cancelled`, `event_expired`, `time_skip_elision`, `time_skip_jump`, `thread_opened`, `thread_updated`, `thread_closed`, `meanwhile_scene_started`, `meanwhile_scene_closed`, `synthesized_memories`) carry the new state changes. Two new tables (`events`, `threads`) hold lifecycle state. Existing handlers (`memory_written`, `edge_update`) gain new payload sources without changes — promotion logic lives in services, not in projector handlers.
**Tech Stack:** Same as Phase 2 (Python 3.11+, FastAPI, HTMX, SQLite, Featherless). No new dependencies.
**Source-of-truth references:**
- Phase 3 scope: requirements doc §13 "Phase 3 — events, skips, threads"
- Behavioral details: §4 (per-chat clocks), §6.3 (prompt assembly), §6.4 (drawer), §8.1 (retrieved-memory inputs), §9 ("Time, Skips, Events — Phase 3 surface"), §11 (significance & compression)
- Conventions: [../../CLAUDE.md](../../CLAUDE.md) §"Behavioral defaults" + §"Phase 2 status"
- Phase 2 plan (style, TDD pattern, parallel-dispatch mechanics): [2026-04-26-v2-phase2-implementation.md](2026-04-26-v2-phase2-implementation.md)
When a task says "see §X", that's the requirements doc unless stated otherwise.
---
## Pre-flight
**Branch:** create `phase-3` from the latest `main` after Phase 2 has merged. If Phase 2 is still in PR review, branch off `phase-2` directly:
```bash
# Option A: after main has phase-2 merged
git checkout main && git pull && git checkout -b phase-3
# Option B: continue from phase-2 directly
git checkout phase-2 && git pull && git checkout -b phase-3
```
**Schema baseline:** Phase 2 leaves the DB at version 8. Phase 3 adds two migrations: `0009_events.sql` and `0010_threads.sql`. No other migrations expected.
**Phase 2.5 backlog:** the items in CLAUDE.md §"Phase 2.5 / 3 backlog" are NOT scoped here — they should be cleaned up in a separate branch off `main` (suggested name `phase-2.5`) before or in parallel with Phase 3. None of them blocks Phase 3.
**Pinned non-negotiables (carried forward):**
- State changes go through the event log. Use `append_and_apply(conn, kind, payload)` for the live path; `apply_event` only after a fresh `append_event` returning the new id.
- Witness filter every memory read at SQL level (hard `WHERE` constraint; never a soft signal).
- Edges are directed; `botA → botB` and `botB → botA` are independent records.
- Per-POV scene summaries — never write omniscient narration. (Meanwhile scenes write per-POV summaries for both present bots; you receive a digest later, not during the scene.)
- TDD: every task starts with a failing test.
- One commit per task minimum, more if it splits naturally.
**Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output. Don't assume green.
---
## Parallel-Execution Strategy
Same pattern as Phase 2. Eight waves: parallel within each wave (file-disjoint), serial across waves. The controller (you, the controlling Claude session) merges each subagent's commits and verifies the suite stays green before dispatching the next wave.
### How to dispatch a wave in parallel
Use the **Agent tool with `isolation: "worktree"`** so each subagent gets its own git worktree. The runtime cleans up the worktree automatically if no changes are made; otherwise it returns the path + branch for the controller to merge. (If the controlling session's working directory is **not** the chat repo, create worktrees manually with `git worktree add .worktrees/<wave>-<task> -b <wave>/<task> phase-3` from inside the chat repo and pass the worktree path explicitly into each subagent prompt — that is the pattern Phase 2 used.)
In a single message, dispatch all tasks in the wave:
```
Agent({
description: "Wave 1 — T49 events table + handlers",
subagent_type: "general-purpose",
isolation: "worktree",
prompt: "<full task text from below>",
})
Agent({
description: "Wave 1 — T50 time_skip handlers",
subagent_type: "general-purpose",
isolation: "worktree",
prompt: "<full task text from below>",
})
Agent({
description: "Wave 1 — T51 threads table + handlers",
subagent_type: "general-purpose",
isolation: "worktree",
prompt: "<full task text from below>",
})
```
All subagents start simultaneously, each working on a private worktree branched off `phase-3`. They cannot see each other's changes (no shared filesystem state) — that's the safety guarantee.
### After a wave completes
1. Each subagent returns its worktree path and commit SHA.
2. **Run a spec + code-quality reviewer subagent on each completed task** (combined review is acceptable for purely mechanical schema/handler tasks; large or integration tasks like T62, T63 deserve separate spec + quality reviewers).
3. **Merge the wave into `phase-3`** in any order (file-disjointness guarantees no conflict). Use `--no-ff` so each task's history stays grouped:
```bash
git checkout phase-3
for branch in <wave-branches>; do
git merge --no-ff "$branch" -m "merge: <task description>"
done
```
4. **Run the full test suite** on the merged `phase-3`. If it's red, the wave's mutual-independence assumption was violated — bisect to find the offending pair, fix in a follow-up commit, re-merge.
5. **Push `phase-3`** to gitea so the work is durable before the next wave starts.
6. Optionally clean up worktrees: `git worktree remove .worktrees/<branch>` and `git branch -D <branch>`.
### Conflict prevention checklist (apply before dispatch)
For each parallel wave, verify the **Files** sections of all tasks have **no overlapping paths**. The waves below are designed to satisfy this; if you decide to add or merge tasks, re-check.
If a hot file (`chat/web/turns.py`, `chat/services/prompt.py`, `chat/web/drawer.py`, `chat/templates/_drawer.html`, `chat/services/regenerate.py`) needs changes from multiple tasks, do **not** parallelize them — serialize within the wave or split into separate waves.
### Failure recovery
If one subagent fails (test failures, blocked, infinite loop):
- **Do not block the wave on a failure.** Cancel the failed subagent, merge the others' successful work, and re-dispatch the failed task as a single follow-up.
- If a failure exposes a bad assumption shared by multiple tasks (e.g. an event-payload schema mismatch), pause the wave and revisit the plan.
### Why each wave is parallel-safe
| Wave | Tasks | Hot files touched | Disjoint? |
|------|-------|-------------------|-----------|
| 1 | T49, T50, T51 | new SQL migrations + new state modules; T50 also extends `chat/state/world.py` (additive) | ✅ |
| 2 | T52, T53, T54, T55 | new service modules only | ✅ |
| 3 | T56, T57, T58 | new service module (T56) + `chat/state/memory.py` retrieval extension (T57) + `chat/services/scene_summarize.py` (T58) | ✅ |
| 4 | T59 | `chat/web/drawer.py`, `chat/templates/_drawer.html` | (single task) |
| 5a | T60, T61 | `chat/services/prompt.py` (T60), `chat/web/turns.py` (T61) | ✅ |
| 5b | T62 | `chat/web/turns.py`, plus a new skip route module | (single task; depends on 5a) |
| 6 | T63, T64, T65 | meanwhile is tightly coupled — see Wave 6 sub-structure below | ⚠️ partial |
| 7 | T66, T67 | new test file + docs only | ✅ |
**Wave 6 sub-structure:** T63 is schema/state (new files); T64 is service + extends `chat/web/turns.py`; T65 is service + extends `chat/services/prompt.py`. T64 and T65 are file-disjoint relative to each other but both depend on T63's schema landing first. Dispatch as: T63 alone → merge → T64+T65 in parallel → merge.
---
## Task overview
```
Wave 1 ─┬─ T49: events table + lifecycle handlers
├─ T50: time_skip event kinds + handlers (advance chat clock)
└─ T51: threads table + open/update/close handlers
Wave 2 ─┬─ T52: event-lifecycle detection service (narrative → state changes)
├─ T53: skip narration service (elision + jump prose)
├─ T54: synthesized-memories service (jump skip "anything notable?")
└─ T55: thread-detection service (on scene close, identify open threads)
Wave 3 ─┬─ T56: event-completion promotion (inventory / edges / memories)
├─ T57: significance retrieval ranking refinements
└─ T58: scene compression keeps key quotes when significance ≥ 2
Wave 4 ─── T59: drawer additions — events panel, threads panel, skip controls
Wave 5a ─┬─ T60: prompt assembly includes active events + active threads
└─ T61: turn flow invokes event-detection + thread-update per turn
Wave 5b ─── T62: skip command surface (parse + route + jump UI prompt)
Wave 6 ─┬─ T63: meanwhile scene config — schema + state + scene-config-4 marker
└─ (after T63 merges)
├─ T64: meanwhile turn flow (host+guest, no "you")
└─ T65: meanwhile summary digest (briefs you on next active scene)
Wave 7 ─┬─ T66: cross-feature integration tests (events × skips × threads × meanwhile)
└─ T67: Phase 3 documentation update
```
Critical path: 8 sequential merge points (Waves 1, 2, 3, 4, 5a, 5b, 6a, 6b, 7). Total tasks: 19. Wall-clock parallelism advantage depends on subagent dispatch overhead, but in principle each wave's tasks can run concurrently in ~the time of one task.
---
## Wave 1 — Schema & state foundation
These three tasks are **fully independent**: each adds a new SQL migration + new state module. T50 also adds two handlers to `chat/state/world.py` (additive, alongside Phase 2's `_apply_guest_added`).
### Task 49: Events table + lifecycle handlers
**Files:**
- Create: `chat/db/migrations/0009_events.sql`
- Create: `chat/state/events.py`
- Create: `tests/test_events_state.py`
**Spec:** Adds the `events` table and projector handlers for the lifecycle: `event_planned`, `event_started`, `event_completed`, `event_cancelled`, `event_expired`. Each event row carries `chat_id`, `kind` (free-form domain-event tag like `"date_at_park"`), `status` (`planned|active|completed|cancelled|expired`), `props_json` (arbitrary blob), `planned_for` (ISO-8601 chat-clock string, optional), `started_at` / `completed_at` (chat-clock strings).
**Step 1: failing test** — see pattern in `tests/test_group_node.py` (Phase 2 T36). Three tests minimum:
1. `test_event_planned_creates_row`: append `event_planned` with `kind`, `props_json`, `planned_for`; project; assert `get_event(conn, event_id)` returns the row with `status="planned"`.
2. `test_event_started_then_completed_updates_status`: append `event_planned` → `event_started` → `event_completed`; assert `status` transitions and `completed_at` populated.
3. `test_event_cancelled_terminal`: append `event_planned` → `event_cancelled`; assert `status="cancelled"`. A subsequent `event_started` is ignored (handler no-op when status is terminal).
**Step 3: implementation** — `0009_events.sql`:
```sql
CREATE TABLE events (
id INTEGER PRIMARY KEY,
chat_id TEXT NOT NULL,
kind TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'planned',
props_json TEXT NOT NULL DEFAULT '{}',
planned_for TEXT,
started_at TEXT,
completed_at TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX events_chat_idx ON events(chat_id, status);
```
`chat/state/events.py`:
- `@on("event_planned")` inserts a new row with status `planned`. Payload provides a stable `event_id` (caller-allocated UUID) so the projector is idempotent.
- `@on("event_started")` updates status to `active` and sets `started_at` from payload (or current chat clock).
- `@on("event_completed")`, `@on("event_cancelled")`, `@on("event_expired")` each move to the named terminal state and stamp `completed_at` (the column doubles as "ended at").
- `get_event(conn, event_id)`, `list_active_events(conn, chat_id)`, `list_events_in_status(conn, chat_id, status)` readers.
- All handlers no-op when the row is already in a terminal state (idempotent re-projection safety).
**Step 5: commit** — `feat: events table + lifecycle handlers (T49)`.
**Notes for the implementer:**
- Use UUID-style ids (e.g., `f"evt_{uuid.uuid4().hex[:12]}"`) created by the caller; pass as `event_id` in payload. Don't auto-generate inside the projector.
- Schema version after this migration alone: 9. The full Phase 3 baseline is 10 (T51 adds 0010_threads.sql).
- `tests/test_world.py::test_schema_version_after_migration_is_8` will need to bump after Wave 1 merges — handle in the wave-merge step (mirrors Phase 2 T36's pattern).
---
### Task 50: Time-skip event kinds + chat-clock handlers
**Files:**
- Modify: `chat/state/world.py` (add `_apply_time_skip_elision`, `_apply_time_skip_jump`; both update `chats.time` and may reset `activity` rows)
- Create: `tests/test_time_skip_handlers.py`
**Spec:** Two new event kinds.
- `time_skip_elision` payload: `{chat_id, new_time}`. Handler updates `chats.time = ?`. Activity rows are NOT reset (the activity that was elided to its end-state is the resolution itself; the caller passes a follow-up `activity_changed` event when needed).
- `time_skip_jump` payload: `{chat_id, new_time, reset_activity: bool}`. Handler updates `chats.time = ?`; if `reset_activity` is true, deletes per-chat `activity` rows for the participants in that chat (a fresh landing state will be set by a follow-up `activity_changed` event from the skip service).
These are pure state mutations. T54 and T62 fire them via `append_and_apply`.
**Tests:** 3 minimum.
1. `test_elision_advances_chat_clock_only`: seed chat at time T0; append `time_skip_elision` with `new_time=T1`; project; assert `get_chat(...)["time"] == T1` and activity unchanged.
2. `test_jump_with_reset_clears_activity`: seed chat with one activity row; append `time_skip_jump` with `reset_activity=True`; assert chat clock advanced AND activity table empty for that chat.
3. `test_jump_without_reset_preserves_activity`: same seed; `reset_activity=False`; assert activity row still present and clock advanced.
**Implementation:** new handlers next to `_apply_chat_created` in `chat/state/world.py`. Use the same parameterized SQL patterns. Do NOT add UI here — T62 wires the skip command flow.
**Commit:** `feat: time_skip event handlers (T50)`.
---
### Task 51: Threads table + open/update/close handlers
**Files:**
- Create: `chat/db/migrations/0010_threads.sql`
- Create: `chat/state/threads.py`
- Create: `tests/test_threads_state.py`
**Spec:** Adds the `threads` table and projector handlers for `thread_opened`, `thread_updated`, `thread_closed`. A thread is a per-chat narrative continuity tag — open during scenes, surfaced to prompt assembly so successor scenes can reference unresolved arcs.
`0010_threads.sql`:
```sql
CREATE TABLE threads (
id INTEGER PRIMARY KEY,
chat_id TEXT NOT NULL,
title TEXT NOT NULL,
summary TEXT NOT NULL DEFAULT '',
status TEXT NOT NULL DEFAULT 'open', -- open | closed
opened_at TEXT NOT NULL DEFAULT (datetime('now')),
closed_at TEXT,
last_referenced_scene_id INTEGER,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX threads_chat_status_idx ON threads(chat_id, status);
```
`chat/state/threads.py`:
- `@on("thread_opened")` payload: `{thread_id, chat_id, title, summary?}`. Inserts a new row with `status='open'`.
- `@on("thread_updated")` payload: `{thread_id, summary, last_referenced_scene_id?}`. Updates summary + optional last-referenced-scene pointer.
- `@on("thread_closed")` payload: `{thread_id, closed_at?}`. Sets status='closed', stamps `closed_at`.
- Readers: `get_thread(conn, thread_id)`, `list_open_threads(conn, chat_id)`, `list_threads(conn, chat_id, status=None)`.
**Tests:** 3 minimum.
1. `test_thread_opened_creates_row`.
2. `test_thread_updated_changes_summary_and_last_referenced`.
3. `test_thread_closed_terminal`: subsequent `thread_updated` is ignored (matches the design's "closed threads are kept for replay but don't surface in prompt").
**Note:** the Phase 2 `group_node.threads_json` column was a Phase-3 placeholder and is NOT used as authoritative storage now — `threads` table is the source of truth. The drawer can choose to render either, but Phase 3 onward should treat the table as canonical and treat `group_node.threads_json` as a deprecated cache that we leave alone (or clear in the next migration).
**Commit:** `feat: threads table + projector handlers (T51)`.
---
## Wave 2 — Classifier services (parallel)
Four tasks, all new service modules — fully file-disjoint.
### Task 52: Event-lifecycle detection service
**Files:**
- Create: `chat/services/event_lifecycle.py`
- Create: `tests/test_event_lifecycle.py`
**Spec:** A classifier-wrapped service that inspects a freshly-narrated turn and decides whether any active events transitioned this turn (started, completed, cancelled). Returns a structured `EventLifecycleDecision` with one or more `EventTransition(event_id, new_status, reason)` items, or empty when nothing changed.
Schema:
```python
class EventTransition(BaseModel):
event_id: str
new_status: str # "active" | "completed" | "cancelled"
reason: str = ""
class EventLifecycleDecision(BaseModel):
transitions: list[EventTransition] = Field(default_factory=list)
```
Public API:
```python
async def detect_event_transitions(
client: LLMClient,
*,
classifier_model: str,
narrative_text: str,
active_events: list[dict], # [{id, kind, status, props}, ...] from list_active_events
timeout_s: float = 30.0,
) -> EventLifecycleDecision:
"""Decide whether any active events transitioned this turn. Conservative
bias — most turns return empty transitions. Trigger only when the
narrative text clearly resolves or starts a known active event.
"""
```
Caller (T61 turn flow) appends one `event_started` / `event_completed` / `event_cancelled` event per transition via `append_and_apply`.
**Tests:** 3 minimum — happy path with one transition, empty active_events short-circuits without classifier call, classifier failure returns empty default.
**Commit:** `feat: event-lifecycle detection service (T52)`.
---
### Task 53: Skip narration service
**Files:**
- Create: `chat/services/skip_narration.py`
- Create: `tests/test_skip_narration.py`
**Spec:** Generates the brief transition narration that bridges a time skip. Two flavors mirroring §9:
- **Elision:** "skip to when we arrive". Input: current activity ("walking to park"), expected end-state ("at the park, sitting on a bench"). Output: 1-2 sentence transition prose narrated from the host bot's POV. New chat-clock value is provided by the caller.
- **Jump:** "next morning". Input: time delta + landing-state hint (optional). Output: 2-3 sentences setting the scene at the new time.
Public API:
```python
async def narrate_skip(
client: LLMClient,
*,
narrative_model: str,
skip_kind: str, # "elision" | "jump"
speaker_bot: dict, # {id, name, persona}
you_name: str,
current_time: str,
new_time: str,
current_activity: str,
landing_state_hint: str = "",
timeout_s: float = 60.0,
) -> str:
"""Generate brief transition prose. Returns plain text, not JSON."""
```
Uses `client.generate(...)` (not `classify`) since output is free-form prose. Falls back to a deterministic template string on failure (e.g., `f"({new_time}: {landing_state_hint or current_activity}.)"`). The fallback ensures the skip flow never blocks even when the LLM is down.
**Tests:** 3 minimum — happy elision, happy jump, generation failure returns fallback string with the new time visible.
**Commit:** `feat: skip narration service (T53)`.
---
### Task 54: Synthesized-memories service
**Files:**
- Create: `chat/services/synthesized_memories.py`
- Create: `tests/test_synthesized_memories.py`
**Spec:** When the user does a jump skip ("a week later") they're prompted "anything notable happen?" If they answer with prose, this service parses that prose into 1-N synthesized memories per present bot. Each memory carries `source="synthesized"`, `reliability=0.7`, witness mask `[1, 1, 0]` or `[1, 1, 1]` per present set, and a one-sentence text body.
Schema:
```python
class SynthesizedMemory(BaseModel):
text: str
significance: int = 1 # 0..3, default 1
affinity_delta: int = 0
trust_delta: int = 0
class SynthesizedDigest(BaseModel):
memories: list[SynthesizedMemory] = Field(default_factory=list)
```
Public API:
```python
async def synthesize_memories(
client: LLMClient,
*,
classifier_model: str,
prose: str,
bot_name: str, # which witness's POV
bot_persona: str,
you_name: str,
timeout_s: float = 30.0,
) -> SynthesizedDigest:
"""Parse 'anything notable happen?' prose into structured memories
from a single bot's POV. Empty/whitespace prose short-circuits."""
```
Caller (T62 skip flow) calls this once per present bot (host always; guest if present), then writes via `record_turn_memory_for_present` with `source="synthesized"` and the synthesized text in place of narrative_text.
**Tests:** 3 minimum — happy path returns parseable memories, empty prose short-circuits, classifier failure returns empty digest.
**Commit:** `feat: synthesized-memories service for jump skips (T54)`.
---
### Task 55: Thread-detection service
**Files:**
- Create: `chat/services/thread_detection.py`
- Create: `tests/test_thread_detection.py`
**Spec:** On scene close, classify the scene transcript to detect open threads (unresolved arcs, dangling questions, promises made). Returns a list of `ThreadCandidate(title, summary, action: "open"|"update"|"close", existing_thread_id?)`.
The service receives the current set of open threads so it can decide to **update** an existing thread rather than open a duplicate. It can also signal **close** when the transcript clearly resolves an open thread.
Schema:
```python
class ThreadCandidate(BaseModel):
action: str # "open" | "update" | "close"
title: str = "" # required for "open"; ignored otherwise
summary: str = ""
existing_thread_id: str | None = None # required for "update"/"close"
class ThreadDetectionResult(BaseModel):
candidates: list[ThreadCandidate] = Field(default_factory=list)
```
Public API:
```python
async def detect_threads(
client: LLMClient,
*,
classifier_model: str,
scene_transcript: list[dict], # [{speaker, text}, ...]
open_threads: list[dict], # [{id, title, summary}, ...]
timeout_s: float = 30.0,
) -> ThreadDetectionResult:
"""Classify scene close into thread open/update/close candidates."""
```
Caller (T58 scene compression — added in Wave 3) loops over candidates and emits one `thread_opened`, `thread_updated`, or `thread_closed` event per candidate.
**Tests:** 3 minimum — opens a new thread, updates an existing thread (test asserts `existing_thread_id` is honored), classifier failure returns empty.
**Commit:** `feat: thread-detection service (T55)`.
---
## Wave 3 — Promotion & retrieval refinements
Three tasks. T56 is a new service module (event-completion promotion). T57 modifies `chat/state/memory.py` to add a significance-aware retrieval rank. T58 modifies `chat/services/scene_summarize.py` to integrate compression hints + the thread-detection service from T55. File-disjoint.
### Task 56: Event-completion promotion
**Files:**
- Create: `chat/services/event_promotion.py`
- Create: `tests/test_event_promotion.py`
**Spec:** When an event reaches `completed` (the only terminal state that promotes; cancelled/expired do NOT promote per §9 last paragraph), the orchestrator promotes any structured artifacts the event carried into the appropriate target store:
- `event.props.acquired_objects: list[str]` → append `inventory_added` events (Phase 4 schema; Phase 3 stub: just append a `manual_edit` with `target_kind="memory_pov_summary"` describing the acquisition into the host's memory).
- `event.props.knowledge_facts: list[{owner_id, target_id, fact}]` → append `edge_update` events with the facts on the named directed edge.
- `event.props.relationship_change: {summary, source_id, target_id}` → append `manual_edit` with `target_kind="edge_summary"` for that pair.
- Everything else stays in the closed event record (the projector kept the row; no further promotion).
Public API:
```python
def promote_completed_event(
conn,
*,
event_id: str,
chat_id: str,
chat_clock_at: str | None,
) -> dict:
"""Read the completed event's props_json and emit promotion events.
Returns a summary dict {inventory: int, knowledge: int, relationship: int}
of how many promotion events fired. No classifier calls — purely
structural. Skips if event status isn't 'completed'."""
```
This is **synchronous** (no async, no LLM). It reads a row, parses JSON, emits events via `append_and_apply`.
**Tests:** 4 minimum — empty props no-op, knowledge_facts produces edge_update events, relationship_change produces manual_edit, cancelled-event-doesn't-promote.
**Commit:** `feat: event-completion promotion service (T56)`.
---
### Task 57: Significance-aware retrieval ranking
**Files:**
- Modify: `chat/state/memory.py` (extend `search_memories(conn, owner_id, witness_role, query, k)` to add a significance bias to the rank ordering)
- Modify: `tests/test_memory_search.py` (or wherever the existing search tests live; add 2 tests)
**Spec:** Currently `search_memories` orders by FTS rank only. §11.1 says "Retrieval ranking: significance multiplier applied as `score × constant` to FTS / vector rank." Phase 3 implements this for FTS only (vector retrieval is Phase 4).
Change the SQL `ORDER BY` from `ORDER BY rank` to `ORDER BY (rank + significance * 0.5) DESC` (or whatever scaling produces sane results — this is a tuning knob, document the choice in a comment). The constant may need adjustment after manual play; surface it as a module-level constant `SIGNIFICANCE_RANK_BIAS`.
**Tests:** 2 added.
1. `test_higher_significance_outranks_equal_rank`: seed two memories with identical FTS-matching text but different significance scores; assert the higher-significance row appears first in results.
2. `test_significance_bias_is_constant_module_level`: verify the constant is accessible as `chat.state.memory.SIGNIFICANCE_RANK_BIAS` (so it's tunable without a code change in calling sites).
**Commit:** `feat: significance-aware retrieval ranking (T57)`.
---
### Task 58: Scene compression keeps key quotes when significance ≥ 2
**Files:**
- Modify: `chat/services/scene_summarize.py` (extend `apply_scene_close_summary` to also call `detect_threads` from T55 and emit thread events; extend the per-POV summary to include up to 3 verbatim "key quotes" from the closing scene when scene-max-significance ≥ 2)
- Modify: `tests/test_per_pov_summary.py` (add 3 tests for the new behavior)
**Spec:** §11.1 specifies "Compression: scenes with max-turn-significance ≥ 2 retain key quotes; ≤ 1 collapse fully into the per-POV summary." Implement this:
- Compute scene max significance from `memories.significance` rows in this scene.
- When max < 2: existing behavior unchanged (per-POV summary, no extra quotes).
- When max ≥ 2: include up to 3 verbatim quote spans (each ≤ 200 chars) in the per-POV summary text. Format: append `\n\nKey quotes:\n- "..."\n- "..."` to the summary. The `summarize_scene` classifier already produces the prose; the quote-selection step is a deterministic post-process that picks the top-3 highest-significance turn texts from the scene transcript (truncated).
Additionally, after writing per-POV summaries (existing behavior), call `detect_threads` (from T55) once per close. For each candidate emit the matching `thread_opened` / `thread_updated` / `thread_closed` event via `append_and_apply`. Failures fall back to no thread changes (existing memory + edge updates still land).
**Tests:** 3 added.
1. `test_low_significance_scene_omits_quotes`: max significance = 1; assert summary text contains no "Key quotes:" header.
2. `test_high_significance_scene_includes_top_3_quotes`: seed 4 memories with significance 3, 2, 1, 2; assert summary contains the top-3 (by significance) verbatim turn texts.
3. `test_thread_detection_emits_events`: stub `detect_threads` to return one `ThreadCandidate(action="open", ...)`; assert a `thread_opened` event landed.
**Commit:** `feat: significance-driven quote retention + thread emission on close (T58)`.
---
## Wave 4 — Drawer additions (single task)
This wave is one task because all Phase 3 drawer additions touch `chat/web/drawer.py` and `chat/templates/_drawer.html` together — splitting would force serial execution with conflicts.
### Task 59: Drawer events / threads / skip controls
**Files:**
- Modify: `chat/web/drawer.py` (extend `GET /chats/{chat_id}/drawer`; add `POST /chats/{chat_id}/drawer/event/plan`, `/drawer/event/cancel/{event_id}`, `/drawer/skip/elision`, `/drawer/skip/jump`, `/drawer/thread/close/{thread_id}`)
- Modify: `chat/templates/_drawer.html` (3 new sections: Events, Threads, Skip controls)
- Create: `tests/test_drawer_events_threads_skip.py`
**Spec:**
**GET extension:**
- `list_active_events(conn, chat_id)` → render in a new "Events" section.
- `list_open_threads(conn, chat_id)` → render in a new "Threads" section.
- A "Skip" subsection with two buttons: "Elision skip" (opens an inline form taking a `landing_state_hint`) and "Jump skip" (opens an inline form taking `target_time` ISO + optional `notable_prose` for the synthesized-memories prompt).
**POST routes:**
1. `POST /drawer/event/plan` — form `{kind, planned_for, props_json}` → 400-validates JSON, appends `event_planned`, returns refreshed drawer.
2. `POST /drawer/event/cancel/{event_id}` — appends `event_cancelled`, returns refreshed drawer.
3. `POST /drawer/skip/elision` — form `{landing_state_hint, new_time}` → calls `narrate_skip` (T53), appends `time_skip_elision` + an `assistant_turn` carrying the narration, returns refreshed drawer + chat partial.
4. `POST /drawer/skip/jump` — form `{new_time, notable_prose, reset_activity}` → calls `narrate_skip` for transition prose, calls `synthesize_memories` (T54) for each present bot, appends `time_skip_jump` + memories + transition turn, returns refreshed drawer + chat partial.
5. `POST /drawer/thread/close/{thread_id}` — appends `thread_closed`, returns refreshed drawer.
**Template additions:**
- "Events" section listing each active event by kind + planned_for + props.
- "Threads" section listing each open thread title + summary + a Close button.
- "Skip" controls under existing Activity section.
- Forms use HTMX (`hx-post`, `hx-target="#drawer"`, `hx-swap="innerHTML"`) consistent with Phase 2 drawer patterns.
**Tests (`tests/test_drawer_events_threads_skip.py`):** 6 minimum.
1. GET drawer with no events/threads → no Events/Threads sections rendered.
2. POST event/plan with valid form → event_planned event appended; drawer body now contains the event title.
3. POST event/cancel → event_cancelled appended; drawer no longer lists the event under "Active".
4. POST skip/elision → time_skip_elision appended, chat clock advanced, narration assistant_turn present in chat history.
5. POST skip/jump with notable_prose → time_skip_jump + N synthesized memory_written events; assert reliability=0.7 on those rows.
6. POST thread/close → thread_closed appended; thread no longer in open list.
**Commit:** `feat: drawer events / threads / skip controls (T59)`.
**Notes for implementer:**
- The existing `available_guests` dropdown helper from T42 is the reference for form-population patterns.
- For the Jump skip's `notable_prose` field, treat empty as "no synthesized memories" (just advance the clock) — the spec allows this.
- Validate `target_time` ISO format; 400 on parse failure. Do not allow target_time earlier than current chat clock.
---
## Wave 5a — Prompt + turn-flow integration (parallel)
T60 modifies `chat/services/prompt.py`. T61 modifies `chat/web/turns.py`. File-disjoint.
### Task 60: Prompt assembly includes active events + active threads
**Files:**
- Modify: `chat/services/prompt.py` (extend `assemble_narrative_prompt`)
- Modify: `tests/test_prompt.py` (add 3 tests)
**Spec:** Two new SHOULD-tier blocks added between the existing scene-context block and retrieved-memories block:
1. **Active events** — title `Active events:`. Lists each active event in this chat: `- {kind} (planned for {planned_for})` plus a one-line props excerpt (truncate to ~80 chars). Trim-tier SHOULD; drops before retrieved memories under tight budget.
2. **Active threads** — title `Open threads:`. Lists each open thread: `- {title}: {summary}` (summary truncated to ~120 chars). SHOULD-tier.
Both blocks are omitted entirely when their lists are empty (no header rendered).
Per Phase 2 T43's auto-detection precedent, the function reads `list_active_events(conn, chat_id)` and `list_open_threads(conn, chat_id)` itself; no new parameters.
**Tests:** 3 added.
1. `test_assemble_with_no_events_or_threads_omits_blocks` — regression; no events/threads → assembled prompt has neither block.
2. `test_assemble_with_active_events_renders_block` — seed one event_planned + event_started; assert "Active events:" header and event kind appear in prompt.
3. `test_assemble_with_open_thread_renders_block` — seed one thread_opened; assert "Open threads:" header and thread title appear.
**Commit:** `feat: prompt assembly renders active events + open threads (T60)`.
---
### Task 61: Turn flow invokes event-detection + thread-update per turn
**Files:**
- Modify: `chat/web/turns.py` (after the primary narrative + memory + state-update block, call `detect_event_transitions` from T52; emit `event_started`/`event_completed`/`event_cancelled` events accordingly)
- Modify: `chat/services/regenerate.py` (mirror — regenerate also re-detects event transitions for the regenerated turn)
- Modify: `tests/test_turn_flow.py` (add 3 tests)
**Spec:** After the existing post-turn classifier passes (memory write, state update, interjection check) and BEFORE scene-close detection, call `detect_event_transitions` with `narrative_text=primary_text` and `active_events=list_active_events(conn, chat_id)`.
For each `EventTransition` returned:
- `new_status="active"` → append `event_started` payload `{event_id, started_at: chat.time}`.
- `new_status="completed"` → append `event_completed` payload `{event_id, completed_at: chat.time}` AND THEN call `promote_completed_event` (T56) inline so promotion events emit synchronously after completion.
- `new_status="cancelled"` → append `event_cancelled`. Promotion is skipped.
Empty transitions list = no-op (most turns; no extra events written).
`regenerate.py` mirrors the same logic for the regenerated turn (existing event transitions from the superseded turn are NOT undone — that's a Phase 3.5 follow-up; document the limitation).
**Tests:** 3 added to `tests/test_turn_flow.py`.
1. `test_turn_with_event_transition_appends_started_event`: mock `detect_event_transitions` to return one transition; assert `event_started` lands in event log; canned-response queue matches.
2. `test_turn_with_event_completion_runs_promotion`: same mock returning `new_status="completed"`; seed a planned event with knowledge_facts in props; assert `event_completed` + `edge_update` (from promotion) both land.
3. `test_turn_with_no_active_events_skips_classifier`: no active events; assert `detect_event_transitions` is never called (its canned response slot would still be in the queue at end of test).
**Commit:** `feat: per-turn event-lifecycle detection + completion promotion (T61)`.
---
## Wave 5b — Skip command flow (single task)
Single task because it modifies `chat/web/turns.py` (which Wave 5a also touched). Run after Wave 5a is merged so the file's recent additions are stable.
### Task 62: Skip command surface
**Files:**
- Modify: `chat/web/turns.py` (extend `parse_turn` to detect natural-language skip commands like "skip to the park", "next morning", "a week later" and route to a skip-handling branch BEFORE the normal narrative flow)
- Create: `chat/web/skip.py` (new module hosting `process_elision_skip(...)` and `process_jump_skip(...)` controllers; called by both turns.py and the drawer skip routes from T59)
- Modify: `tests/test_turn_flow.py` (add 3 tests)
**Spec:** Currently `parse_turn` extracts the user's prose into structured fields (addressee inferred, etc.). Phase 3 adds detection of skip commands as a separate intent.
The classifier-based parse already produces an `intent` field (or similar — verify in code). Extend the schema with `intent="skip_elision"` and `intent="skip_jump"`. When intent is one of these, the turn flow short-circuits the normal narrative path and routes to:
- `process_elision_skip(conn, client, settings, *, chat_id, landing_state_hint=parsed.landing_state)` — calls `narrate_skip(skip_kind="elision")`, appends `time_skip_elision`, `assistant_turn` carrying narration, returns 204.
- `process_jump_skip(conn, client, settings, *, chat_id, target_time=parsed.target_time, notable_prose=parsed.notable_prose)` — appends `time_skip_jump`, calls `synthesize_memories` per present bot, appends synthesized `memory_written` events, calls `narrate_skip(skip_kind="jump")`, appends `assistant_turn` carrying transition prose, returns 204.
The drawer routes from T59 share these functions (don't duplicate the logic across drawer.py and turns.py).
For Phase 3's first cut, JUMP skip's `notable_prose` is NOT collected from natural-language ("a week later, anything notable?" requires a UI prompt). Two options:
- **(simpler)** Drawer-only entry for jump skip; natural-language jump short-circuits to drawer prompt.
- **(better UX)** Natural-language jump returns a 422 with an HTMX-swap that injects the "anything notable?" textarea into the chat surface; user submits prose to a follow-up `/chats/{chat_id}/skip/jump/confirm` endpoint.
Pick the simpler path for Phase 3 (drawer-only jump). Document the second option as a Phase 3.5 polish.
**Tests:** 3 added.
1. `test_elision_skip_via_natural_language` — user prose "skip to when we arrive at the park"; assert `time_skip_elision` event landed and chat clock advanced; an `assistant_turn` carrying transition prose was appended.
2. `test_jump_skip_via_natural_language_redirects_to_drawer` — user prose "next morning"; assert response is 422 with an HTMX swap pointing at the drawer's jump form (or whatever the chosen Phase 3 fallback is).
3. `test_skip_command_does_not_run_narrative_classifier` — same user prose as test 1; assert `assemble_narrative_prompt` was NOT called for a regular bot turn (the skip path bypasses it).
**Commit:** `feat: natural-language skip detection + skip command flow (T62)`.
---
## Wave 6 — Meanwhile scenes
Phase 3's capstone feature. Most ambitious: scene config 4 (host + guest, no "you"). Per §13 the cap stays at 2 bots in any scene; meanwhile is two-bot bot↔bot. "You" receives a digest later, not during.
Decomposed into 3 tasks. T63 lands first (schema + state); then T64 + T65 in parallel.
### Task 63: Meanwhile scene config — schema + state
**Files:**
- Create: `chat/db/migrations/0011_meanwhile_scenes.sql`
- Create: `chat/state/meanwhile.py`
- Create: `tests/test_meanwhile_state.py`
**Spec:** A meanwhile scene is a special kind of scene where `present_set = {host_bot_id, guest_bot_id}` (no "you"). The existing `scenes` table can carry it via a new `present_set_kind` column distinguishing `you_host`, `you_host_guest`, `host_guest`. Alternatively, `meanwhile_scenes` is a sidecar table — pick the lower-disruption option.
**Recommended:** add a `present_set_kind` column to `scenes` (default `'you_host'` for back-compat) via migration `0011_meanwhile_scenes.sql`:
```sql
ALTER TABLE scenes ADD COLUMN present_set_kind TEXT NOT NULL DEFAULT 'you_host';
ALTER TABLE scenes ADD COLUMN parent_scene_id INTEGER; -- the active you-scene this meanwhile branched off from
CREATE INDEX scenes_present_set_idx ON scenes(chat_id, present_set_kind, status);
```
New event kinds with `chat/state/meanwhile.py` handlers:
- `@on("meanwhile_scene_started")` payload: `{chat_id, scene_id, host_bot_id, guest_bot_id, parent_scene_id, started_at}`. Inserts a new scene row with `present_set_kind="host_guest"`, links to parent.
- `@on("meanwhile_scene_closed")` payload: `{scene_id, closed_at}`. Updates status to `closed`; subsequent per-POV summary writes for both bots happen via existing scene-close path (host + guest are the "present witnesses"; "you" is excluded).
Readers: `list_meanwhile_scenes(conn, chat_id, status='active')`, `get_parent_scene(conn, scene_id)`.
**Tests:** 3 minimum.
1. `test_meanwhile_started_creates_scene_with_correct_present_set_kind`.
2. `test_meanwhile_closed_marks_scene_closed`.
3. `test_active_you_scene_can_coexist_with_active_meanwhile_scene` (one chat, two active scenes — meanwhile + the main you-scene that spawned it).
**Commit:** `feat: meanwhile scene schema + state (T63)`.
---
### Task 64: Meanwhile turn flow
**Files:**
- Modify: `chat/web/turns.py` (add meanwhile-mode detection at the start of `post_turn`; if active meanwhile scene exists for this chat, route to `process_meanwhile_turn`)
- Create: `chat/web/meanwhile.py` (new module hosting `process_meanwhile_turn(...)` controller; mirrors post_turn but with no "you" in present_set)
- Modify: `chat/services/prompt.py` (small addition: when `present_set_kind="host_guest"`, exclude "you" from edges + activity blocks; addressee is always the other bot)
- Create: `tests/test_meanwhile_turn_flow.py`
**Spec:** A meanwhile scene runs entirely between two bots. The user can advance it manually via a meanwhile-mode chat surface (T65 wires the UI), but turn-flow logic is:
1. Read active meanwhile scene; identify `speaker_bot_id` (alternates each turn — start with host, then guest, etc.) and `addressee_bot_id` (the other one).
2. Assemble narrative prompt with `speaker_bot_id`, `addressee=addressee_bot.name`, `present_set_kind="host_guest"` (so "you" is omitted from edges/activities).
3. Stream narrative; commit `assistant_turn` event with `present_set_kind="host_guest"` and `meanwhile_scene_id` populated.
4. Memory writes: BOTH host and guest get a memory_written with witness `[0, 1, 1]` (you=0; you wasn't present). Use `record_turn_memory_for_present` adapted to the no-you case (or extend it with a `you_present: bool = True` parameter).
5. State updates: 2 directed pairs (host↔guest only). Skip you-related pairs.
6. Scene close detection: same path as regular scenes; on close, per-POV summaries fire for both bots; group_node updates if applicable.
Addressee-alternation: simple — each turn alternates speaker. (Phase 3.5 may add classifier-driven turn-taking with refusals.)
**Tests:** 4 minimum.
1. `test_meanwhile_turn_writes_memories_with_witness_0_1_1`.
2. `test_meanwhile_turn_emits_2_edge_updates_only` (host→guest, guest→host).
3. `test_meanwhile_turn_alternates_speaker` (turn 1: host speaks; turn 2: guest speaks).
4. `test_meanwhile_scene_close_writes_per_pov_for_both_bots_only` (no "you" memory; existing T45 path is hit but with `you_present=False`).
**Commit:** `feat: meanwhile turn flow (host+guest, no you) (T64)`.
---
### Task 65: Meanwhile summary digest
**Files:**
- Modify: `chat/services/scene_summarize.py` (when a meanwhile scene closes, generate ALSO a "you-facing digest" — a brief narrated summary that will surface to "you" the next time the main you-scene resumes)
- Modify: `chat/services/prompt.py` (when assembling for a regular you-scene and any closed-but-not-yet-surfaced meanwhile digests exist, include them as a SHOULD-tier block titled "Meanwhile while you were away:")
- Create: `chat/state/meanwhile_digest.py` (a small state module: `meanwhile_digest_pending` table; handlers for `meanwhile_digest_created` / `meanwhile_digest_consumed`)
- Modify: `tests/test_per_pov_summary.py` and `tests/test_prompt.py` (add tests)
**Spec:** When a meanwhile scene closes (T64's path), also append `meanwhile_digest_created` with `{chat_id, scene_id, summary}`. The summary is generated via a fresh `summarize_scene` call with `bot_persona="omniscient narrator briefing the absent player"`; output is a 2-3 sentence neutral summary of what happened.
When the next you-scene starts (or the prompt is assembled for the next active you-scene's turn), `assemble_narrative_prompt` queries `list_pending_meanwhile_digests(conn, chat_id)` and:
- Includes them as a SHOULD-tier block: `"Meanwhile while you were away:\n- {summary}\n- {summary}"`.
- After they're surfaced once, the caller (T64 in the post-meanwhile turn or the first you-turn after meanwhile-close) appends `meanwhile_digest_consumed` per digest, marking them as surfaced.
Migration `0011_meanwhile_scenes.sql` (T63) can include the `meanwhile_digest_pending` table OR T65 adds a thin `0012_meanwhile_digest.sql`. Pick lower-disruption — likely add to T63's migration for simplicity. Document the choice.
(If you choose to add the table in T65 via a new migration, add `0012_meanwhile_digest.sql`. The schema-version assertion bump in `tests/test_world.py` happens once after Wave 6 merges.)
**Tests:** 3 added.
1. `test_meanwhile_close_creates_digest`: close a meanwhile scene; assert `meanwhile_digest_pending` row exists with non-empty summary.
2. `test_pending_digest_renders_in_you_scene_prompt`: seed a pending digest; assemble prompt for a you-host scene; assert the "Meanwhile while you were away:" header and summary appear.
3. `test_consumed_digest_does_not_render_again`: append `meanwhile_digest_consumed`; reassemble prompt; digest no longer appears.
**Commit:** `feat: meanwhile summary digest surfaces to next you-scene (T65)`.
---
## Wave 7 — Polish (parallel)
Two independent tasks. New test file (T66) + docs only (T67). Dispatch in parallel after Wave 6 merges.
### Task 66: Cross-feature integration tests
**Files:**
- Create: `tests/test_phase3_integration.py`
**Spec:** Phase 3 introduces a lot of cross-feature interaction surfaces. This task adds tests that exercise multi-feature flows end-to-end:
1. Plan an event → play turns → event_started detected → event_completed detected → promotion fires → memory + edge updates land.
2. Open a thread on close → next scene's prompt includes the open thread → close thread via drawer → next scene's prompt no longer includes it.
3. Jump skip → synthesized memories land per present bot → next turn's prompt retrieves them via search.
4. Meanwhile scene → close → digest pending → first you-turn prompt includes digest → after that turn, digest is consumed.
5. Meanwhile while a regular you-scene is active → both scenes have memories; querying memories for either bot at the post-meanwhile main scene correctly returns both sets witness-filtered.
5 tests minimum.
**Commit:** `test: phase 3 cross-feature integration coverage (T66)`.
---
### Task 67: Phase 3 documentation update
**Files:**
- Modify: `CLAUDE.md` (add "Phase 3 status" section; update "Behavioral defaults"; add "Phase 3.5 / 4 backlog" with carry-overs from review feedback during execution)
- Modify: `docs/plans/2026-04-26-v1-requirements-design.md` (annotate §13 "Phase 3 — events, skips, threads" as **Status: shipped <date>**)
**Spec:** Documentation-only. Run last so it captures any deviations and review-noted follow-ups discovered during execution. Reflect:
- Events with full lifecycle (planned → active → completed/cancelled/expired).
- Time skips: elision (immediate end-state) + jump (synthesized memories from "anything notable?").
- Threads opened/updated/closed; surfaced in prompt assembly + drawer.
- Significance retrieval bias + key-quote retention at significance ≥ 2.
- Meanwhile scenes: bot+bot without "you"; per-POV summaries for both bots; you-facing digest on next you-scene.
- Phase 3 known limitations / 3.5 backlog candidates:
- Natural-language jump skip falls back to drawer form (no inline "anything notable?" prompt).
- Regenerate doesn't undo prior event transitions from the superseded turn.
- Meanwhile turn-taking is alternation (no classifier-driven refusals or initiative).
- Vector retrieval is still Phase 4.
**Commit:** `docs: phase 3 status, behavioral defaults, deferred items (T67)`.
---
## Wrap-up
After Wave 7 lands:
1. **Run full suite** on `phase-3`: should be ~260+ tests passing (212 from Phase 2 + ~50 new).
2. **Manual smoke** (recommended before opening the PR):
- Plan an event from the drawer; play turns until it completes; verify promotion landed (drawer shows updated edges / memories).
- Use elision and jump skips both via natural language and the drawer.
- Close a scene that opened a thread; verify the thread renders in the next scene's prompt.
- Trigger a meanwhile scene from the drawer; play 2 turns; close it; resume the main you-scene; verify the digest renders once and not again.
3. **Push `phase-3`** to gitea.
4. **Open PR** `phase-3 → main`.
5. **Phase 3.5 backlog candidates** (track in CLAUDE.md): inline natural-language jump prompt UI, regenerate-aware event-transition undo, classifier-driven meanwhile turn-taking, drawer surface for closed-event browsing, event template library (kind presets with default props).
---
## Notes for the controller running this plan
- **Don't dispatch Wave 5b until Wave 5a is merged AND green on `phase-3`.** Wave 5b's `turns.py` modifications layer on top of T61's recent additions; missing that produces merge conflicts or import-time failures.
- **Don't dispatch T64+T65 until T63 merges.** Both depend on the new `present_set_kind` column and the meanwhile event kinds.
- **After each parallel wave**, run a code-review subagent (`subagent-driven-development` skill's two-stage review pattern) on each task before merging to `phase-3`. For purely mechanical tasks (schema migrations, projector handlers), a combined spec+quality review is acceptable. For T62, T64, T65 (large or integration tasks), use separate spec + quality reviewers.
- **If a parallel wave's merge produces a conflict**, the wave's file-disjointness assumption was violated. Bisect the affected pair, fix the offending task in a follow-up commit on `phase-3`, and proceed.
- **Schema-version test bumps** happen at Wave 1 merge (8 → 10) and Wave 6 merge (10 → 11 or 12 depending on T65's migration choice). Update `tests/test_world.py` once per affected merge — same pattern as Phase 2 T36.
- **Token-spend rough estimate**: Phase 3 should be larger than Phase 2 (~1.5×) — events / skips / meanwhile each carry their own state + service + UI surfaces. Per-task token spend similar to Phase 2's larger tasks (T42, T44).
- **DO NOT modify Phase 1 / 2 code paths** unless explicitly required (e.g., T58 modifies `scene_summarize.py` because the new behavior is genuinely additive). Existing 1- and 2-entity flows must continue to work end-to-end after each wave.