Files
chat/docs/plans/2026-04-26-v3-phase3-implementation.md
T
Joseph Doherty 379054755a docs: add Phase 3 implementation plan with parallel-safe waves
19 tasks across 8 waves covering events with lifecycles, time skips
(elision + jump), active threads, significance/retrieval refinements,
and meanwhile scenes (host+guest with no 'you'). Mirrors the Phase 2
plan structure: pre-flight, parallel-execution strategy with worktree
isolation, file-disjointness analysis per wave, and per-task TDD spec
with commit messages.

Phase 3 schema: adds 0009_events.sql, 0010_threads.sql,
0011_meanwhile_scenes.sql (final version 11). Builds on Phase 2's
3-entity scene support and event-sourced architecture.
2026-04-26 16:55:50 -04:00

51 KiB
Raw Blame History

Roleplay Engine — Phase 3 Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task. Use the parallel-dispatch pattern documented under "Parallel-Execution Strategy" for waves that fan out to multiple subagents.

Goal: Add events with lifecycles, time skips (elision + jump), active threads, significance/retrieval refinements, and "Meanwhile…" scenes (host+guest with no "you" present). All scoped to a single chat; the cross-chat surface remains unchanged.

Architecture: Builds on Phase 2's event-sourced architecture and 3-entity scene support. New event kinds (event_planned, event_started, event_completed, event_cancelled, event_expired, time_skip_elision, time_skip_jump, thread_opened, thread_updated, thread_closed, meanwhile_scene_started, meanwhile_scene_closed, synthesized_memories) carry the new state changes. Two new tables (events, threads) hold lifecycle state. Existing handlers (memory_written, edge_update) gain new payload sources without changes — promotion logic lives in services, not in projector handlers.

Tech Stack: Same as Phase 2 (Python 3.11+, FastAPI, HTMX, SQLite, Featherless). No new dependencies.

Source-of-truth references:

  • Phase 3 scope: requirements doc §13 "Phase 3 — events, skips, threads"
  • Behavioral details: §4 (per-chat clocks), §6.3 (prompt assembly), §6.4 (drawer), §8.1 (retrieved-memory inputs), §9 ("Time, Skips, Events — Phase 3 surface"), §11 (significance & compression)
  • Conventions: ../../CLAUDE.md §"Behavioral defaults" + §"Phase 2 status"
  • Phase 2 plan (style, TDD pattern, parallel-dispatch mechanics): 2026-04-26-v2-phase2-implementation.md

When a task says "see §X", that's the requirements doc unless stated otherwise.


Pre-flight

Branch: create phase-3 from the latest main after Phase 2 has merged. If Phase 2 is still in PR review, branch off phase-2 directly:

# Option A: after main has phase-2 merged
git checkout main && git pull && git checkout -b phase-3

# Option B: continue from phase-2 directly
git checkout phase-2 && git pull && git checkout -b phase-3

Schema baseline: Phase 2 leaves the DB at version 8. Phase 3 adds two migrations: 0009_events.sql and 0010_threads.sql. No other migrations expected.

Phase 2.5 backlog: the items in CLAUDE.md §"Phase 2.5 / 3 backlog" are NOT scoped here — they should be cleaned up in a separate branch off main (suggested name phase-2.5) before or in parallel with Phase 3. None of them blocks Phase 3.

Pinned non-negotiables (carried forward):

  • State changes go through the event log. Use append_and_apply(conn, kind, payload) for the live path; apply_event only after a fresh append_event returning the new id.
  • Witness filter every memory read at SQL level (hard WHERE constraint; never a soft signal).
  • Edges are directed; botA → botB and botB → botA are independent records.
  • Per-POV scene summaries — never write omniscient narration. (Meanwhile scenes write per-POV summaries for both present bots; you receive a digest later, not during the scene.)
  • TDD: every task starts with a failing test.
  • One commit per task minimum, more if it splits naturally.

Verification before claiming done: Use superpowers-extended-cc:verification-before-completion — run the test command, paste actual output. Don't assume green.


Parallel-Execution Strategy

Same pattern as Phase 2. Eight waves: parallel within each wave (file-disjoint), serial across waves. The controller (you, the controlling Claude session) merges each subagent's commits and verifies the suite stays green before dispatching the next wave.

How to dispatch a wave in parallel

Use the Agent tool with isolation: "worktree" so each subagent gets its own git worktree. The runtime cleans up the worktree automatically if no changes are made; otherwise it returns the path + branch for the controller to merge. (If the controlling session's working directory is not the chat repo, create worktrees manually with git worktree add .worktrees/<wave>-<task> -b <wave>/<task> phase-3 from inside the chat repo and pass the worktree path explicitly into each subagent prompt — that is the pattern Phase 2 used.)

In a single message, dispatch all tasks in the wave:

Agent({
  description: "Wave 1 — T49 events table + handlers",
  subagent_type: "general-purpose",
  isolation: "worktree",
  prompt: "<full task text from below>",
})
Agent({
  description: "Wave 1 — T50 time_skip handlers",
  subagent_type: "general-purpose",
  isolation: "worktree",
  prompt: "<full task text from below>",
})
Agent({
  description: "Wave 1 — T51 threads table + handlers",
  subagent_type: "general-purpose",
  isolation: "worktree",
  prompt: "<full task text from below>",
})

All subagents start simultaneously, each working on a private worktree branched off phase-3. They cannot see each other's changes (no shared filesystem state) — that's the safety guarantee.

After a wave completes

  1. Each subagent returns its worktree path and commit SHA.

  2. Run a spec + code-quality reviewer subagent on each completed task (combined review is acceptable for purely mechanical schema/handler tasks; large or integration tasks like T62, T63 deserve separate spec + quality reviewers).

  3. Merge the wave into phase-3 in any order (file-disjointness guarantees no conflict). Use --no-ff so each task's history stays grouped:

    git checkout phase-3
    for branch in <wave-branches>; do
      git merge --no-ff "$branch" -m "merge: <task description>"
    done
    
  4. Run the full test suite on the merged phase-3. If it's red, the wave's mutual-independence assumption was violated — bisect to find the offending pair, fix in a follow-up commit, re-merge.

  5. Push phase-3 to gitea so the work is durable before the next wave starts.

  6. Optionally clean up worktrees: git worktree remove .worktrees/<branch> and git branch -D <branch>.

Conflict prevention checklist (apply before dispatch)

For each parallel wave, verify the Files sections of all tasks have no overlapping paths. The waves below are designed to satisfy this; if you decide to add or merge tasks, re-check.

If a hot file (chat/web/turns.py, chat/services/prompt.py, chat/web/drawer.py, chat/templates/_drawer.html, chat/services/regenerate.py) needs changes from multiple tasks, do not parallelize them — serialize within the wave or split into separate waves.

Failure recovery

If one subagent fails (test failures, blocked, infinite loop):

  • Do not block the wave on a failure. Cancel the failed subagent, merge the others' successful work, and re-dispatch the failed task as a single follow-up.
  • If a failure exposes a bad assumption shared by multiple tasks (e.g. an event-payload schema mismatch), pause the wave and revisit the plan.

Why each wave is parallel-safe

Wave Tasks Hot files touched Disjoint?
1 T49, T50, T51 new SQL migrations + new state modules; T50 also extends chat/state/world.py (additive)
2 T52, T53, T54, T55 new service modules only
3 T56, T57, T58 new service module (T56) + chat/state/memory.py retrieval extension (T57) + chat/services/scene_summarize.py (T58)
4 T59 chat/web/drawer.py, chat/templates/_drawer.html (single task)
5a T60, T61 chat/services/prompt.py (T60), chat/web/turns.py (T61)
5b T62 chat/web/turns.py, plus a new skip route module (single task; depends on 5a)
6 T63, T64, T65 meanwhile is tightly coupled — see Wave 6 sub-structure below ⚠️ partial
7 T66, T67 new test file + docs only

Wave 6 sub-structure: T63 is schema/state (new files); T64 is service + extends chat/web/turns.py; T65 is service + extends chat/services/prompt.py. T64 and T65 are file-disjoint relative to each other but both depend on T63's schema landing first. Dispatch as: T63 alone → merge → T64+T65 in parallel → merge.


Task overview

Wave 1  ─┬─ T49: events table + lifecycle handlers
         ├─ T50: time_skip event kinds + handlers (advance chat clock)
         └─ T51: threads table + open/update/close handlers

Wave 2  ─┬─ T52: event-lifecycle detection service (narrative → state changes)
         ├─ T53: skip narration service (elision + jump prose)
         ├─ T54: synthesized-memories service (jump skip "anything notable?")
         └─ T55: thread-detection service (on scene close, identify open threads)

Wave 3  ─┬─ T56: event-completion promotion (inventory / edges / memories)
         ├─ T57: significance retrieval ranking refinements
         └─ T58: scene compression keeps key quotes when significance ≥ 2

Wave 4  ─── T59: drawer additions — events panel, threads panel, skip controls

Wave 5a ─┬─ T60: prompt assembly includes active events + active threads
         └─ T61: turn flow invokes event-detection + thread-update per turn

Wave 5b ─── T62: skip command surface (parse + route + jump UI prompt)

Wave 6  ─┬─ T63: meanwhile scene config — schema + state + scene-config-4 marker
         └─ (after T63 merges)
            ├─ T64: meanwhile turn flow (host+guest, no "you")
            └─ T65: meanwhile summary digest (briefs you on next active scene)

Wave 7  ─┬─ T66: cross-feature integration tests (events × skips × threads × meanwhile)
         └─ T67: Phase 3 documentation update

Critical path: 8 sequential merge points (Waves 1, 2, 3, 4, 5a, 5b, 6a, 6b, 7). Total tasks: 19. Wall-clock parallelism advantage depends on subagent dispatch overhead, but in principle each wave's tasks can run concurrently in ~the time of one task.


Wave 1 — Schema & state foundation

These three tasks are fully independent: each adds a new SQL migration + new state module. T50 also adds two handlers to chat/state/world.py (additive, alongside Phase 2's _apply_guest_added).

Task 49: Events table + lifecycle handlers

Files:

  • Create: chat/db/migrations/0009_events.sql
  • Create: chat/state/events.py
  • Create: tests/test_events_state.py

Spec: Adds the events table and projector handlers for the lifecycle: event_planned, event_started, event_completed, event_cancelled, event_expired. Each event row carries chat_id, kind (free-form domain-event tag like "date_at_park"), status (planned|active|completed|cancelled|expired), props_json (arbitrary blob), planned_for (ISO-8601 chat-clock string, optional), started_at / completed_at (chat-clock strings).

Step 1: failing test — see pattern in tests/test_group_node.py (Phase 2 T36). Three tests minimum:

  1. test_event_planned_creates_row: append event_planned with kind, props_json, planned_for; project; assert get_event(conn, event_id) returns the row with status="planned".
  2. test_event_started_then_completed_updates_status: append event_plannedevent_startedevent_completed; assert status transitions and completed_at populated.
  3. test_event_cancelled_terminal: append event_plannedevent_cancelled; assert status="cancelled". A subsequent event_started is ignored (handler no-op when status is terminal).

Step 3: implementation0009_events.sql:

CREATE TABLE events (
    id INTEGER PRIMARY KEY,
    chat_id TEXT NOT NULL,
    kind TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'planned',
    props_json TEXT NOT NULL DEFAULT '{}',
    planned_for TEXT,
    started_at TEXT,
    completed_at TEXT,
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX events_chat_idx ON events(chat_id, status);

chat/state/events.py:

  • @on("event_planned") inserts a new row with status planned. Payload provides a stable event_id (caller-allocated UUID) so the projector is idempotent.
  • @on("event_started") updates status to active and sets started_at from payload (or current chat clock).
  • @on("event_completed"), @on("event_cancelled"), @on("event_expired") each move to the named terminal state and stamp completed_at (the column doubles as "ended at").
  • get_event(conn, event_id), list_active_events(conn, chat_id), list_events_in_status(conn, chat_id, status) readers.
  • All handlers no-op when the row is already in a terminal state (idempotent re-projection safety).

Step 5: commitfeat: events table + lifecycle handlers (T49).

Notes for the implementer:

  • Use UUID-style ids (e.g., f"evt_{uuid.uuid4().hex[:12]}") created by the caller; pass as event_id in payload. Don't auto-generate inside the projector.
  • Schema version after this migration alone: 9. The full Phase 3 baseline is 10 (T51 adds 0010_threads.sql).
  • tests/test_world.py::test_schema_version_after_migration_is_8 will need to bump after Wave 1 merges — handle in the wave-merge step (mirrors Phase 2 T36's pattern).

Task 50: Time-skip event kinds + chat-clock handlers

Files:

  • Modify: chat/state/world.py (add _apply_time_skip_elision, _apply_time_skip_jump; both update chats.time and may reset activity rows)
  • Create: tests/test_time_skip_handlers.py

Spec: Two new event kinds.

  • time_skip_elision payload: {chat_id, new_time}. Handler updates chats.time = ?. Activity rows are NOT reset (the activity that was elided to its end-state is the resolution itself; the caller passes a follow-up activity_changed event when needed).
  • time_skip_jump payload: {chat_id, new_time, reset_activity: bool}. Handler updates chats.time = ?; if reset_activity is true, deletes per-chat activity rows for the participants in that chat (a fresh landing state will be set by a follow-up activity_changed event from the skip service).

These are pure state mutations. T54 and T62 fire them via append_and_apply.

Tests: 3 minimum.

  1. test_elision_advances_chat_clock_only: seed chat at time T0; append time_skip_elision with new_time=T1; project; assert get_chat(...)["time"] == T1 and activity unchanged.
  2. test_jump_with_reset_clears_activity: seed chat with one activity row; append time_skip_jump with reset_activity=True; assert chat clock advanced AND activity table empty for that chat.
  3. test_jump_without_reset_preserves_activity: same seed; reset_activity=False; assert activity row still present and clock advanced.

Implementation: new handlers next to _apply_chat_created in chat/state/world.py. Use the same parameterized SQL patterns. Do NOT add UI here — T62 wires the skip command flow.

Commit: feat: time_skip event handlers (T50).


Task 51: Threads table + open/update/close handlers

Files:

  • Create: chat/db/migrations/0010_threads.sql
  • Create: chat/state/threads.py
  • Create: tests/test_threads_state.py

Spec: Adds the threads table and projector handlers for thread_opened, thread_updated, thread_closed. A thread is a per-chat narrative continuity tag — open during scenes, surfaced to prompt assembly so successor scenes can reference unresolved arcs.

0010_threads.sql:

CREATE TABLE threads (
    id INTEGER PRIMARY KEY,
    chat_id TEXT NOT NULL,
    title TEXT NOT NULL,
    summary TEXT NOT NULL DEFAULT '',
    status TEXT NOT NULL DEFAULT 'open',  -- open | closed
    opened_at TEXT NOT NULL DEFAULT (datetime('now')),
    closed_at TEXT,
    last_referenced_scene_id INTEGER,
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX threads_chat_status_idx ON threads(chat_id, status);

chat/state/threads.py:

  • @on("thread_opened") payload: {thread_id, chat_id, title, summary?}. Inserts a new row with status='open'.
  • @on("thread_updated") payload: {thread_id, summary, last_referenced_scene_id?}. Updates summary + optional last-referenced-scene pointer.
  • @on("thread_closed") payload: {thread_id, closed_at?}. Sets status='closed', stamps closed_at.
  • Readers: get_thread(conn, thread_id), list_open_threads(conn, chat_id), list_threads(conn, chat_id, status=None).

Tests: 3 minimum.

  1. test_thread_opened_creates_row.
  2. test_thread_updated_changes_summary_and_last_referenced.
  3. test_thread_closed_terminal: subsequent thread_updated is ignored (matches the design's "closed threads are kept for replay but don't surface in prompt").

Note: the Phase 2 group_node.threads_json column was a Phase-3 placeholder and is NOT used as authoritative storage now — threads table is the source of truth. The drawer can choose to render either, but Phase 3 onward should treat the table as canonical and treat group_node.threads_json as a deprecated cache that we leave alone (or clear in the next migration).

Commit: feat: threads table + projector handlers (T51).


Wave 2 — Classifier services (parallel)

Four tasks, all new service modules — fully file-disjoint.

Task 52: Event-lifecycle detection service

Files:

  • Create: chat/services/event_lifecycle.py
  • Create: tests/test_event_lifecycle.py

Spec: A classifier-wrapped service that inspects a freshly-narrated turn and decides whether any active events transitioned this turn (started, completed, cancelled). Returns a structured EventLifecycleDecision with one or more EventTransition(event_id, new_status, reason) items, or empty when nothing changed.

Schema:

class EventTransition(BaseModel):
    event_id: str
    new_status: str  # "active" | "completed" | "cancelled"
    reason: str = ""

class EventLifecycleDecision(BaseModel):
    transitions: list[EventTransition] = Field(default_factory=list)

Public API:

async def detect_event_transitions(
    client: LLMClient,
    *,
    classifier_model: str,
    narrative_text: str,
    active_events: list[dict],   # [{id, kind, status, props}, ...] from list_active_events
    timeout_s: float = 30.0,
) -> EventLifecycleDecision:
    """Decide whether any active events transitioned this turn. Conservative
    bias — most turns return empty transitions. Trigger only when the
    narrative text clearly resolves or starts a known active event.
    """

Caller (T61 turn flow) appends one event_started / event_completed / event_cancelled event per transition via append_and_apply.

Tests: 3 minimum — happy path with one transition, empty active_events short-circuits without classifier call, classifier failure returns empty default.

Commit: feat: event-lifecycle detection service (T52).


Task 53: Skip narration service

Files:

  • Create: chat/services/skip_narration.py
  • Create: tests/test_skip_narration.py

Spec: Generates the brief transition narration that bridges a time skip. Two flavors mirroring §9:

  • Elision: "skip to when we arrive". Input: current activity ("walking to park"), expected end-state ("at the park, sitting on a bench"). Output: 1-2 sentence transition prose narrated from the host bot's POV. New chat-clock value is provided by the caller.
  • Jump: "next morning". Input: time delta + landing-state hint (optional). Output: 2-3 sentences setting the scene at the new time.

Public API:

async def narrate_skip(
    client: LLMClient,
    *,
    narrative_model: str,
    skip_kind: str,  # "elision" | "jump"
    speaker_bot: dict,     # {id, name, persona}
    you_name: str,
    current_time: str,
    new_time: str,
    current_activity: str,
    landing_state_hint: str = "",
    timeout_s: float = 60.0,
) -> str:
    """Generate brief transition prose. Returns plain text, not JSON."""

Uses client.generate(...) (not classify) since output is free-form prose. Falls back to a deterministic template string on failure (e.g., f"({new_time}: {landing_state_hint or current_activity}.)"). The fallback ensures the skip flow never blocks even when the LLM is down.

Tests: 3 minimum — happy elision, happy jump, generation failure returns fallback string with the new time visible.

Commit: feat: skip narration service (T53).


Task 54: Synthesized-memories service

Files:

  • Create: chat/services/synthesized_memories.py
  • Create: tests/test_synthesized_memories.py

Spec: When the user does a jump skip ("a week later") they're prompted "anything notable happen?" If they answer with prose, this service parses that prose into 1-N synthesized memories per present bot. Each memory carries source="synthesized", reliability=0.7, witness mask [1, 1, 0] or [1, 1, 1] per present set, and a one-sentence text body.

Schema:

class SynthesizedMemory(BaseModel):
    text: str
    significance: int = 1   # 0..3, default 1
    affinity_delta: int = 0
    trust_delta: int = 0

class SynthesizedDigest(BaseModel):
    memories: list[SynthesizedMemory] = Field(default_factory=list)

Public API:

async def synthesize_memories(
    client: LLMClient,
    *,
    classifier_model: str,
    prose: str,
    bot_name: str,            # which witness's POV
    bot_persona: str,
    you_name: str,
    timeout_s: float = 30.0,
) -> SynthesizedDigest:
    """Parse 'anything notable happen?' prose into structured memories
    from a single bot's POV. Empty/whitespace prose short-circuits."""

Caller (T62 skip flow) calls this once per present bot (host always; guest if present), then writes via record_turn_memory_for_present with source="synthesized" and the synthesized text in place of narrative_text.

Tests: 3 minimum — happy path returns parseable memories, empty prose short-circuits, classifier failure returns empty digest.

Commit: feat: synthesized-memories service for jump skips (T54).


Task 55: Thread-detection service

Files:

  • Create: chat/services/thread_detection.py
  • Create: tests/test_thread_detection.py

Spec: On scene close, classify the scene transcript to detect open threads (unresolved arcs, dangling questions, promises made). Returns a list of ThreadCandidate(title, summary, action: "open"|"update"|"close", existing_thread_id?).

The service receives the current set of open threads so it can decide to update an existing thread rather than open a duplicate. It can also signal close when the transcript clearly resolves an open thread.

Schema:

class ThreadCandidate(BaseModel):
    action: str  # "open" | "update" | "close"
    title: str = ""        # required for "open"; ignored otherwise
    summary: str = ""
    existing_thread_id: str | None = None  # required for "update"/"close"

class ThreadDetectionResult(BaseModel):
    candidates: list[ThreadCandidate] = Field(default_factory=list)

Public API:

async def detect_threads(
    client: LLMClient,
    *,
    classifier_model: str,
    scene_transcript: list[dict],   # [{speaker, text}, ...]
    open_threads: list[dict],       # [{id, title, summary}, ...]
    timeout_s: float = 30.0,
) -> ThreadDetectionResult:
    """Classify scene close into thread open/update/close candidates."""

Caller (T58 scene compression — added in Wave 3) loops over candidates and emits one thread_opened, thread_updated, or thread_closed event per candidate.

Tests: 3 minimum — opens a new thread, updates an existing thread (test asserts existing_thread_id is honored), classifier failure returns empty.

Commit: feat: thread-detection service (T55).


Wave 3 — Promotion & retrieval refinements

Three tasks. T56 is a new service module (event-completion promotion). T57 modifies chat/state/memory.py to add a significance-aware retrieval rank. T58 modifies chat/services/scene_summarize.py to integrate compression hints + the thread-detection service from T55. File-disjoint.

Task 56: Event-completion promotion

Files:

  • Create: chat/services/event_promotion.py
  • Create: tests/test_event_promotion.py

Spec: When an event reaches completed (the only terminal state that promotes; cancelled/expired do NOT promote per §9 last paragraph), the orchestrator promotes any structured artifacts the event carried into the appropriate target store:

  • event.props.acquired_objects: list[str] → append inventory_added events (Phase 4 schema; Phase 3 stub: just append a manual_edit with target_kind="memory_pov_summary" describing the acquisition into the host's memory).
  • event.props.knowledge_facts: list[{owner_id, target_id, fact}] → append edge_update events with the facts on the named directed edge.
  • event.props.relationship_change: {summary, source_id, target_id} → append manual_edit with target_kind="edge_summary" for that pair.
  • Everything else stays in the closed event record (the projector kept the row; no further promotion).

Public API:

def promote_completed_event(
    conn,
    *,
    event_id: str,
    chat_id: str,
    chat_clock_at: str | None,
) -> dict:
    """Read the completed event's props_json and emit promotion events.
    Returns a summary dict {inventory: int, knowledge: int, relationship: int}
    of how many promotion events fired. No classifier calls — purely
    structural. Skips if event status isn't 'completed'."""

This is synchronous (no async, no LLM). It reads a row, parses JSON, emits events via append_and_apply.

Tests: 4 minimum — empty props no-op, knowledge_facts produces edge_update events, relationship_change produces manual_edit, cancelled-event-doesn't-promote.

Commit: feat: event-completion promotion service (T56).


Task 57: Significance-aware retrieval ranking

Files:

  • Modify: chat/state/memory.py (extend search_memories(conn, owner_id, witness_role, query, k) to add a significance bias to the rank ordering)
  • Modify: tests/test_memory_search.py (or wherever the existing search tests live; add 2 tests)

Spec: Currently search_memories orders by FTS rank only. §11.1 says "Retrieval ranking: significance multiplier applied as score × constant to FTS / vector rank." Phase 3 implements this for FTS only (vector retrieval is Phase 4).

Change the SQL ORDER BY from ORDER BY rank to ORDER BY (rank + significance * 0.5) DESC (or whatever scaling produces sane results — this is a tuning knob, document the choice in a comment). The constant may need adjustment after manual play; surface it as a module-level constant SIGNIFICANCE_RANK_BIAS.

Tests: 2 added.

  1. test_higher_significance_outranks_equal_rank: seed two memories with identical FTS-matching text but different significance scores; assert the higher-significance row appears first in results.
  2. test_significance_bias_is_constant_module_level: verify the constant is accessible as chat.state.memory.SIGNIFICANCE_RANK_BIAS (so it's tunable without a code change in calling sites).

Commit: feat: significance-aware retrieval ranking (T57).


Task 58: Scene compression keeps key quotes when significance ≥ 2

Files:

  • Modify: chat/services/scene_summarize.py (extend apply_scene_close_summary to also call detect_threads from T55 and emit thread events; extend the per-POV summary to include up to 3 verbatim "key quotes" from the closing scene when scene-max-significance ≥ 2)
  • Modify: tests/test_per_pov_summary.py (add 3 tests for the new behavior)

Spec: §11.1 specifies "Compression: scenes with max-turn-significance ≥ 2 retain key quotes; ≤ 1 collapse fully into the per-POV summary." Implement this:

  • Compute scene max significance from memories.significance rows in this scene.
  • When max < 2: existing behavior unchanged (per-POV summary, no extra quotes).
  • When max ≥ 2: include up to 3 verbatim quote spans (each ≤ 200 chars) in the per-POV summary text. Format: append \n\nKey quotes:\n- "..."\n- "..." to the summary. The summarize_scene classifier already produces the prose; the quote-selection step is a deterministic post-process that picks the top-3 highest-significance turn texts from the scene transcript (truncated).

Additionally, after writing per-POV summaries (existing behavior), call detect_threads (from T55) once per close. For each candidate emit the matching thread_opened / thread_updated / thread_closed event via append_and_apply. Failures fall back to no thread changes (existing memory + edge updates still land).

Tests: 3 added.

  1. test_low_significance_scene_omits_quotes: max significance = 1; assert summary text contains no "Key quotes:" header.
  2. test_high_significance_scene_includes_top_3_quotes: seed 4 memories with significance 3, 2, 1, 2; assert summary contains the top-3 (by significance) verbatim turn texts.
  3. test_thread_detection_emits_events: stub detect_threads to return one ThreadCandidate(action="open", ...); assert a thread_opened event landed.

Commit: feat: significance-driven quote retention + thread emission on close (T58).


Wave 4 — Drawer additions (single task)

This wave is one task because all Phase 3 drawer additions touch chat/web/drawer.py and chat/templates/_drawer.html together — splitting would force serial execution with conflicts.

Task 59: Drawer events / threads / skip controls

Files:

  • Modify: chat/web/drawer.py (extend GET /chats/{chat_id}/drawer; add POST /chats/{chat_id}/drawer/event/plan, /drawer/event/cancel/{event_id}, /drawer/skip/elision, /drawer/skip/jump, /drawer/thread/close/{thread_id})
  • Modify: chat/templates/_drawer.html (3 new sections: Events, Threads, Skip controls)
  • Create: tests/test_drawer_events_threads_skip.py

Spec:

GET extension:

  • list_active_events(conn, chat_id) → render in a new "Events" section.
  • list_open_threads(conn, chat_id) → render in a new "Threads" section.
  • A "Skip" subsection with two buttons: "Elision skip" (opens an inline form taking a landing_state_hint) and "Jump skip" (opens an inline form taking target_time ISO + optional notable_prose for the synthesized-memories prompt).

POST routes:

  1. POST /drawer/event/plan — form {kind, planned_for, props_json} → 400-validates JSON, appends event_planned, returns refreshed drawer.
  2. POST /drawer/event/cancel/{event_id} — appends event_cancelled, returns refreshed drawer.
  3. POST /drawer/skip/elision — form {landing_state_hint, new_time} → calls narrate_skip (T53), appends time_skip_elision + an assistant_turn carrying the narration, returns refreshed drawer + chat partial.
  4. POST /drawer/skip/jump — form {new_time, notable_prose, reset_activity} → calls narrate_skip for transition prose, calls synthesize_memories (T54) for each present bot, appends time_skip_jump + memories + transition turn, returns refreshed drawer + chat partial.
  5. POST /drawer/thread/close/{thread_id} — appends thread_closed, returns refreshed drawer.

Template additions:

  • "Events" section listing each active event by kind + planned_for + props.
  • "Threads" section listing each open thread title + summary + a Close button.
  • "Skip" controls under existing Activity section.
  • Forms use HTMX (hx-post, hx-target="#drawer", hx-swap="innerHTML") consistent with Phase 2 drawer patterns.

Tests (tests/test_drawer_events_threads_skip.py): 6 minimum.

  1. GET drawer with no events/threads → no Events/Threads sections rendered.
  2. POST event/plan with valid form → event_planned event appended; drawer body now contains the event title.
  3. POST event/cancel → event_cancelled appended; drawer no longer lists the event under "Active".
  4. POST skip/elision → time_skip_elision appended, chat clock advanced, narration assistant_turn present in chat history.
  5. POST skip/jump with notable_prose → time_skip_jump + N synthesized memory_written events; assert reliability=0.7 on those rows.
  6. POST thread/close → thread_closed appended; thread no longer in open list.

Commit: feat: drawer events / threads / skip controls (T59).

Notes for implementer:

  • The existing available_guests dropdown helper from T42 is the reference for form-population patterns.
  • For the Jump skip's notable_prose field, treat empty as "no synthesized memories" (just advance the clock) — the spec allows this.
  • Validate target_time ISO format; 400 on parse failure. Do not allow target_time earlier than current chat clock.

Wave 5a — Prompt + turn-flow integration (parallel)

T60 modifies chat/services/prompt.py. T61 modifies chat/web/turns.py. File-disjoint.

Task 60: Prompt assembly includes active events + active threads

Files:

  • Modify: chat/services/prompt.py (extend assemble_narrative_prompt)
  • Modify: tests/test_prompt.py (add 3 tests)

Spec: Two new SHOULD-tier blocks added between the existing scene-context block and retrieved-memories block:

  1. Active events — title Active events:. Lists each active event in this chat: - {kind} (planned for {planned_for}) plus a one-line props excerpt (truncate to ~80 chars). Trim-tier SHOULD; drops before retrieved memories under tight budget.
  2. Active threads — title Open threads:. Lists each open thread: - {title}: {summary} (summary truncated to ~120 chars). SHOULD-tier.

Both blocks are omitted entirely when their lists are empty (no header rendered).

Per Phase 2 T43's auto-detection precedent, the function reads list_active_events(conn, chat_id) and list_open_threads(conn, chat_id) itself; no new parameters.

Tests: 3 added.

  1. test_assemble_with_no_events_or_threads_omits_blocks — regression; no events/threads → assembled prompt has neither block.
  2. test_assemble_with_active_events_renders_block — seed one event_planned + event_started; assert "Active events:" header and event kind appear in prompt.
  3. test_assemble_with_open_thread_renders_block — seed one thread_opened; assert "Open threads:" header and thread title appear.

Commit: feat: prompt assembly renders active events + open threads (T60).


Task 61: Turn flow invokes event-detection + thread-update per turn

Files:

  • Modify: chat/web/turns.py (after the primary narrative + memory + state-update block, call detect_event_transitions from T52; emit event_started/event_completed/event_cancelled events accordingly)
  • Modify: chat/services/regenerate.py (mirror — regenerate also re-detects event transitions for the regenerated turn)
  • Modify: tests/test_turn_flow.py (add 3 tests)

Spec: After the existing post-turn classifier passes (memory write, state update, interjection check) and BEFORE scene-close detection, call detect_event_transitions with narrative_text=primary_text and active_events=list_active_events(conn, chat_id).

For each EventTransition returned:

  • new_status="active" → append event_started payload {event_id, started_at: chat.time}.
  • new_status="completed" → append event_completed payload {event_id, completed_at: chat.time} AND THEN call promote_completed_event (T56) inline so promotion events emit synchronously after completion.
  • new_status="cancelled" → append event_cancelled. Promotion is skipped.

Empty transitions list = no-op (most turns; no extra events written).

regenerate.py mirrors the same logic for the regenerated turn (existing event transitions from the superseded turn are NOT undone — that's a Phase 3.5 follow-up; document the limitation).

Tests: 3 added to tests/test_turn_flow.py.

  1. test_turn_with_event_transition_appends_started_event: mock detect_event_transitions to return one transition; assert event_started lands in event log; canned-response queue matches.
  2. test_turn_with_event_completion_runs_promotion: same mock returning new_status="completed"; seed a planned event with knowledge_facts in props; assert event_completed + edge_update (from promotion) both land.
  3. test_turn_with_no_active_events_skips_classifier: no active events; assert detect_event_transitions is never called (its canned response slot would still be in the queue at end of test).

Commit: feat: per-turn event-lifecycle detection + completion promotion (T61).


Wave 5b — Skip command flow (single task)

Single task because it modifies chat/web/turns.py (which Wave 5a also touched). Run after Wave 5a is merged so the file's recent additions are stable.

Task 62: Skip command surface

Files:

  • Modify: chat/web/turns.py (extend parse_turn to detect natural-language skip commands like "skip to the park", "next morning", "a week later" and route to a skip-handling branch BEFORE the normal narrative flow)
  • Create: chat/web/skip.py (new module hosting process_elision_skip(...) and process_jump_skip(...) controllers; called by both turns.py and the drawer skip routes from T59)
  • Modify: tests/test_turn_flow.py (add 3 tests)

Spec: Currently parse_turn extracts the user's prose into structured fields (addressee inferred, etc.). Phase 3 adds detection of skip commands as a separate intent.

The classifier-based parse already produces an intent field (or similar — verify in code). Extend the schema with intent="skip_elision" and intent="skip_jump". When intent is one of these, the turn flow short-circuits the normal narrative path and routes to:

  • process_elision_skip(conn, client, settings, *, chat_id, landing_state_hint=parsed.landing_state) — calls narrate_skip(skip_kind="elision"), appends time_skip_elision, assistant_turn carrying narration, returns 204.
  • process_jump_skip(conn, client, settings, *, chat_id, target_time=parsed.target_time, notable_prose=parsed.notable_prose) — appends time_skip_jump, calls synthesize_memories per present bot, appends synthesized memory_written events, calls narrate_skip(skip_kind="jump"), appends assistant_turn carrying transition prose, returns 204.

The drawer routes from T59 share these functions (don't duplicate the logic across drawer.py and turns.py).

For Phase 3's first cut, JUMP skip's notable_prose is NOT collected from natural-language ("a week later, anything notable?" requires a UI prompt). Two options:

  • (simpler) Drawer-only entry for jump skip; natural-language jump short-circuits to drawer prompt.
  • (better UX) Natural-language jump returns a 422 with an HTMX-swap that injects the "anything notable?" textarea into the chat surface; user submits prose to a follow-up /chats/{chat_id}/skip/jump/confirm endpoint.

Pick the simpler path for Phase 3 (drawer-only jump). Document the second option as a Phase 3.5 polish.

Tests: 3 added.

  1. test_elision_skip_via_natural_language — user prose "skip to when we arrive at the park"; assert time_skip_elision event landed and chat clock advanced; an assistant_turn carrying transition prose was appended.
  2. test_jump_skip_via_natural_language_redirects_to_drawer — user prose "next morning"; assert response is 422 with an HTMX swap pointing at the drawer's jump form (or whatever the chosen Phase 3 fallback is).
  3. test_skip_command_does_not_run_narrative_classifier — same user prose as test 1; assert assemble_narrative_prompt was NOT called for a regular bot turn (the skip path bypasses it).

Commit: feat: natural-language skip detection + skip command flow (T62).


Wave 6 — Meanwhile scenes

Phase 3's capstone feature. Most ambitious: scene config 4 (host + guest, no "you"). Per §13 the cap stays at 2 bots in any scene; meanwhile is two-bot bot↔bot. "You" receives a digest later, not during.

Decomposed into 3 tasks. T63 lands first (schema + state); then T64 + T65 in parallel.

Task 63: Meanwhile scene config — schema + state

Files:

  • Create: chat/db/migrations/0011_meanwhile_scenes.sql
  • Create: chat/state/meanwhile.py
  • Create: tests/test_meanwhile_state.py

Spec: A meanwhile scene is a special kind of scene where present_set = {host_bot_id, guest_bot_id} (no "you"). The existing scenes table can carry it via a new present_set_kind column distinguishing you_host, you_host_guest, host_guest. Alternatively, meanwhile_scenes is a sidecar table — pick the lower-disruption option.

Recommended: add a present_set_kind column to scenes (default 'you_host' for back-compat) via migration 0011_meanwhile_scenes.sql:

ALTER TABLE scenes ADD COLUMN present_set_kind TEXT NOT NULL DEFAULT 'you_host';
ALTER TABLE scenes ADD COLUMN parent_scene_id INTEGER;  -- the active you-scene this meanwhile branched off from
CREATE INDEX scenes_present_set_idx ON scenes(chat_id, present_set_kind, status);

New event kinds with chat/state/meanwhile.py handlers:

  • @on("meanwhile_scene_started") payload: {chat_id, scene_id, host_bot_id, guest_bot_id, parent_scene_id, started_at}. Inserts a new scene row with present_set_kind="host_guest", links to parent.
  • @on("meanwhile_scene_closed") payload: {scene_id, closed_at}. Updates status to closed; subsequent per-POV summary writes for both bots happen via existing scene-close path (host + guest are the "present witnesses"; "you" is excluded).

Readers: list_meanwhile_scenes(conn, chat_id, status='active'), get_parent_scene(conn, scene_id).

Tests: 3 minimum.

  1. test_meanwhile_started_creates_scene_with_correct_present_set_kind.
  2. test_meanwhile_closed_marks_scene_closed.
  3. test_active_you_scene_can_coexist_with_active_meanwhile_scene (one chat, two active scenes — meanwhile + the main you-scene that spawned it).

Commit: feat: meanwhile scene schema + state (T63).


Task 64: Meanwhile turn flow

Files:

  • Modify: chat/web/turns.py (add meanwhile-mode detection at the start of post_turn; if active meanwhile scene exists for this chat, route to process_meanwhile_turn)
  • Create: chat/web/meanwhile.py (new module hosting process_meanwhile_turn(...) controller; mirrors post_turn but with no "you" in present_set)
  • Modify: chat/services/prompt.py (small addition: when present_set_kind="host_guest", exclude "you" from edges + activity blocks; addressee is always the other bot)
  • Create: tests/test_meanwhile_turn_flow.py

Spec: A meanwhile scene runs entirely between two bots. The user can advance it manually via a meanwhile-mode chat surface (T65 wires the UI), but turn-flow logic is:

  1. Read active meanwhile scene; identify speaker_bot_id (alternates each turn — start with host, then guest, etc.) and addressee_bot_id (the other one).
  2. Assemble narrative prompt with speaker_bot_id, addressee=addressee_bot.name, present_set_kind="host_guest" (so "you" is omitted from edges/activities).
  3. Stream narrative; commit assistant_turn event with present_set_kind="host_guest" and meanwhile_scene_id populated.
  4. Memory writes: BOTH host and guest get a memory_written with witness [0, 1, 1] (you=0; you wasn't present). Use record_turn_memory_for_present adapted to the no-you case (or extend it with a you_present: bool = True parameter).
  5. State updates: 2 directed pairs (host↔guest only). Skip you-related pairs.
  6. Scene close detection: same path as regular scenes; on close, per-POV summaries fire for both bots; group_node updates if applicable.

Addressee-alternation: simple — each turn alternates speaker. (Phase 3.5 may add classifier-driven turn-taking with refusals.)

Tests: 4 minimum.

  1. test_meanwhile_turn_writes_memories_with_witness_0_1_1.
  2. test_meanwhile_turn_emits_2_edge_updates_only (host→guest, guest→host).
  3. test_meanwhile_turn_alternates_speaker (turn 1: host speaks; turn 2: guest speaks).
  4. test_meanwhile_scene_close_writes_per_pov_for_both_bots_only (no "you" memory; existing T45 path is hit but with you_present=False).

Commit: feat: meanwhile turn flow (host+guest, no you) (T64).


Task 65: Meanwhile summary digest

Files:

  • Modify: chat/services/scene_summarize.py (when a meanwhile scene closes, generate ALSO a "you-facing digest" — a brief narrated summary that will surface to "you" the next time the main you-scene resumes)
  • Modify: chat/services/prompt.py (when assembling for a regular you-scene and any closed-but-not-yet-surfaced meanwhile digests exist, include them as a SHOULD-tier block titled "Meanwhile while you were away:")
  • Create: chat/state/meanwhile_digest.py (a small state module: meanwhile_digest_pending table; handlers for meanwhile_digest_created / meanwhile_digest_consumed)
  • Modify: tests/test_per_pov_summary.py and tests/test_prompt.py (add tests)

Spec: When a meanwhile scene closes (T64's path), also append meanwhile_digest_created with {chat_id, scene_id, summary}. The summary is generated via a fresh summarize_scene call with bot_persona="omniscient narrator briefing the absent player"; output is a 2-3 sentence neutral summary of what happened.

When the next you-scene starts (or the prompt is assembled for the next active you-scene's turn), assemble_narrative_prompt queries list_pending_meanwhile_digests(conn, chat_id) and:

  • Includes them as a SHOULD-tier block: "Meanwhile while you were away:\n- {summary}\n- {summary}".
  • After they're surfaced once, the caller (T64 in the post-meanwhile turn or the first you-turn after meanwhile-close) appends meanwhile_digest_consumed per digest, marking them as surfaced.

Migration 0011_meanwhile_scenes.sql (T63) can include the meanwhile_digest_pending table OR T65 adds a thin 0012_meanwhile_digest.sql. Pick lower-disruption — likely add to T63's migration for simplicity. Document the choice.

(If you choose to add the table in T65 via a new migration, add 0012_meanwhile_digest.sql. The schema-version assertion bump in tests/test_world.py happens once after Wave 6 merges.)

Tests: 3 added.

  1. test_meanwhile_close_creates_digest: close a meanwhile scene; assert meanwhile_digest_pending row exists with non-empty summary.
  2. test_pending_digest_renders_in_you_scene_prompt: seed a pending digest; assemble prompt for a you-host scene; assert the "Meanwhile while you were away:" header and summary appear.
  3. test_consumed_digest_does_not_render_again: append meanwhile_digest_consumed; reassemble prompt; digest no longer appears.

Commit: feat: meanwhile summary digest surfaces to next you-scene (T65).


Wave 7 — Polish (parallel)

Two independent tasks. New test file (T66) + docs only (T67). Dispatch in parallel after Wave 6 merges.

Task 66: Cross-feature integration tests

Files:

  • Create: tests/test_phase3_integration.py

Spec: Phase 3 introduces a lot of cross-feature interaction surfaces. This task adds tests that exercise multi-feature flows end-to-end:

  1. Plan an event → play turns → event_started detected → event_completed detected → promotion fires → memory + edge updates land.
  2. Open a thread on close → next scene's prompt includes the open thread → close thread via drawer → next scene's prompt no longer includes it.
  3. Jump skip → synthesized memories land per present bot → next turn's prompt retrieves them via search.
  4. Meanwhile scene → close → digest pending → first you-turn prompt includes digest → after that turn, digest is consumed.
  5. Meanwhile while a regular you-scene is active → both scenes have memories; querying memories for either bot at the post-meanwhile main scene correctly returns both sets witness-filtered.

5 tests minimum.

Commit: test: phase 3 cross-feature integration coverage (T66).


Task 67: Phase 3 documentation update

Files:

  • Modify: CLAUDE.md (add "Phase 3 status" section; update "Behavioral defaults"; add "Phase 3.5 / 4 backlog" with carry-overs from review feedback during execution)
  • Modify: docs/plans/2026-04-26-v1-requirements-design.md (annotate §13 "Phase 3 — events, skips, threads" as Status: shipped )

Spec: Documentation-only. Run last so it captures any deviations and review-noted follow-ups discovered during execution. Reflect:

  • Events with full lifecycle (planned → active → completed/cancelled/expired).
  • Time skips: elision (immediate end-state) + jump (synthesized memories from "anything notable?").
  • Threads opened/updated/closed; surfaced in prompt assembly + drawer.
  • Significance retrieval bias + key-quote retention at significance ≥ 2.
  • Meanwhile scenes: bot+bot without "you"; per-POV summaries for both bots; you-facing digest on next you-scene.
  • Phase 3 known limitations / 3.5 backlog candidates:
    • Natural-language jump skip falls back to drawer form (no inline "anything notable?" prompt).
    • Regenerate doesn't undo prior event transitions from the superseded turn.
    • Meanwhile turn-taking is alternation (no classifier-driven refusals or initiative).
    • Vector retrieval is still Phase 4.

Commit: docs: phase 3 status, behavioral defaults, deferred items (T67).


Wrap-up

After Wave 7 lands:

  1. Run full suite on phase-3: should be ~260+ tests passing (212 from Phase 2 + ~50 new).
  2. Manual smoke (recommended before opening the PR):
    • Plan an event from the drawer; play turns until it completes; verify promotion landed (drawer shows updated edges / memories).
    • Use elision and jump skips both via natural language and the drawer.
    • Close a scene that opened a thread; verify the thread renders in the next scene's prompt.
    • Trigger a meanwhile scene from the drawer; play 2 turns; close it; resume the main you-scene; verify the digest renders once and not again.
  3. Push phase-3 to gitea.
  4. Open PR phase-3 → main.
  5. Phase 3.5 backlog candidates (track in CLAUDE.md): inline natural-language jump prompt UI, regenerate-aware event-transition undo, classifier-driven meanwhile turn-taking, drawer surface for closed-event browsing, event template library (kind presets with default props).

Notes for the controller running this plan

  • Don't dispatch Wave 5b until Wave 5a is merged AND green on phase-3. Wave 5b's turns.py modifications layer on top of T61's recent additions; missing that produces merge conflicts or import-time failures.
  • Don't dispatch T64+T65 until T63 merges. Both depend on the new present_set_kind column and the meanwhile event kinds.
  • After each parallel wave, run a code-review subagent (subagent-driven-development skill's two-stage review pattern) on each task before merging to phase-3. For purely mechanical tasks (schema migrations, projector handlers), a combined spec+quality review is acceptable. For T62, T64, T65 (large or integration tasks), use separate spec + quality reviewers.
  • If a parallel wave's merge produces a conflict, the wave's file-disjointness assumption was violated. Bisect the affected pair, fix the offending task in a follow-up commit on phase-3, and proceed.
  • Schema-version test bumps happen at Wave 1 merge (8 → 10) and Wave 6 merge (10 → 11 or 12 depending on T65's migration choice). Update tests/test_world.py once per affected merge — same pattern as Phase 2 T36.
  • Token-spend rough estimate: Phase 3 should be larger than Phase 2 (~1.5×) — events / skips / meanwhile each carry their own state + service + UI surfaces. Per-task token spend similar to Phase 2's larger tasks (T42, T44).
  • DO NOT modify Phase 1 / 2 code paths unless explicitly required (e.g., T58 modifies scene_summarize.py because the new behavior is genuinely additive). Existing 1- and 2-entity flows must continue to work end-to-end after each wave.