13 tasks across 6 waves (1, 2, 3, 4a, 4b, 5). Designed for parallel
subagent execution where file-disjointness allows.
Waves 1, 2, 4a, and 5 each contain 2-3 tasks that touch disjoint files
and can be dispatched concurrently via the Agent tool with
isolation: "worktree". Waves 3 (drawer guest support) and 4b (multi-
entity turn flow) are single-task because they touch hot files
(_drawer.html, turns.py) that cannot be safely co-modified.
Plan covers:
- T36: group_node schema + handlers (new migration 0008)
- T37: guest_added / guest_removed event handlers (modifies world.py)
- T38: relationship-seed service ("have they met?")
- T39: interjection classifier service
- T40: multi-entity state-update coordinator (6 directed pairs)
- T41: multi-witness memory write helper
- T42: drawer guest add/remove UI + render
- T43: multi-entity prompt assembly (extends T18)
- T44: multi-entity turn flow (rewrites post_turn)
- T45: multi-entity per-POV summaries on scene close
- T46: witness filter cross-coverage tests
- T47: bot_reset cascades to guest references
- T48: Phase 2 documentation update
Plan also documents:
- Worktree-per-subagent dispatch pattern using Agent isolation flag
- Merge ordering per wave (file-disjointness = conflict-free merges)
- Failure recovery (cancel failed parallel task, re-dispatch as solo)
- Conflict prevention checklist (verify Files sections disjoint per wave)
Tasks file (.tasks.json) carries dependency DAG with `blockedBy` and
`parallelGroup` so a future executing-plans run can dispatch correctly.
NOT EXECUTING. Plan only.
42 KiB
Roleplay Engine — Phase 2 Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use
superpowers-extended-cc:executing-plansto implement this plan task-by-task. Usesuperpowers-extended-cc:dispatching-parallel-agentsfor the parallel waves below.
Goal: Add multi-entity (3-entity) scene support: guest bot can be added to a host's chat; up to 3 entities present simultaneously (you + host + guest); turn flow handles silent witnesses, interjections, per-pair edges, and per-witness memory; drawer reflects guest state; scene close writes per-POV summaries for each present witness.
Architecture: Builds on Phase 1's event-sourced architecture. New event kinds (guest_added, guest_removed, group_node_initialized) carry the multi-entity state changes; existing handlers (edge_update, memory_written) already accept any source_id/target_id and witness mask, so most schema work is additive. The "have they met?" first-co-appearance prompt runs once per (botA, botB) pair and seeds initial inter-bot edges via existing edge_update events.
Tech Stack: Same as Phase 1 (Python 3.11+, FastAPI, HTMX, SQLite, Featherless). No new dependencies.
Source-of-truth references:
- Phase 2 scope: requirements doc §13 "Phase 2 — multi-entity"
- Behavioral details: requirements doc §6.2 (turn-taking with interjection), §7.5 (guest leaves), §8.5 (memory ownership), §11.2 (per-POV summaries on close)
- Conventions: ../../CLAUDE.md §"Behavioral defaults"
- Phase 1 plan (style, TDD pattern): 2026-04-26-v1-phase1-implementation.md
When a task says "see §X", that's the requirements doc unless stated otherwise.
Pre-flight
Branch: Create phase-2 from the latest main after Phase 1 has been merged. If Phase 1 is still in PR review, branch off phase-1 directly:
# Option A: after main has phase-1 merged
git checkout main && git pull && git checkout -b phase-2
# Option B: continue from phase-1 directly
git checkout phase-1 && git pull && git checkout -b phase-2
Schema baseline: Phase 1 leaves the DB at version 7. Phase 2 adds 0008_group_node.sql. No other migrations expected.
Pinned non-negotiables (carried forward from Phase 1):
- State changes go through the event log. Use
append_and_apply(conn, kind, payload)for the live path;apply_eventonly after a freshappend_eventreturning the new id. - Witness filter every memory read at SQL level (hard
WHEREconstraint; never a soft signal). - Edges are directed;
botA → botBandbotB → botAare independent records. - Per-POV scene summaries — never write omniscient narration.
- TDD: every task starts with a failing test.
- One commit per task minimum, more if it splits naturally.
Verification before claiming done: Use superpowers-extended-cc:verification-before-completion — run the test command, paste actual output. Don't assume green.
Parallel-Execution Strategy
This plan is structured into 6 waves of tasks. Within a wave, tasks are designed to touch disjoint files so they can be executed by parallel subagents safely. Between waves, the controller (you, the controlling Claude session) merges each subagent's commits and verifies the suite stays green before dispatching the next wave.
How to dispatch a wave in parallel
Use the Agent tool with isolation: "worktree" so each subagent gets its own git worktree. The runtime cleans up the worktree automatically if no changes are made; otherwise it returns the path + branch for the controller to merge.
In a single message, dispatch all tasks in the wave:
Agent({
description: "Wave 1 — T36 group_node schema",
subagent_type: "general-purpose",
isolation: "worktree",
prompt: "<full task text from below>",
})
Agent({
description: "Wave 1 — T37 guest events",
subagent_type: "general-purpose",
isolation: "worktree",
prompt: "<full task text from below>",
})
Agent({
description: "Wave 1 — T38 relationship-seed service",
subagent_type: "general-purpose",
isolation: "worktree",
prompt: "<full task text from below>",
})
All three subagents start simultaneously, each working on a private worktree branched off phase-2. They cannot see each other's changes (no shared filesystem state) — that's the safety guarantee.
After a wave completes
-
Each subagent returns its worktree path and commit SHA.
-
Run a spec + quality reviewer subagent on each completed task (same pattern as Phase 1 — see
superpowers-extended-cc:requesting-code-review). -
Merge the wave into
phase-2in any order (file-disjointness guarantees no conflict). Use fast-forward if possible:git checkout phase-2 for branch in <wave-1-branches>; do git merge --no-ff "$branch" -m "merge: <task description>" done -
Run the full test suite on the merged
phase-2. If it's red, the wave's mutual independence assumption was violated — bisect to find the offending pair, fix, re-merge. -
Push
phase-2to gitea so the work is durable before the next wave starts. -
Optionally clean up worktrees:
git worktree remove .worktrees/<branch>.
Conflict prevention checklist (apply before dispatch)
For each parallel wave, verify the Files sections of all tasks in that wave have no overlapping paths. The waves below are designed to satisfy this; if you decide to add or merge tasks, re-check.
If a hot file (chat/web/turns.py, chat/services/prompt.py, chat/templates/_drawer.html) needs changes from multiple tasks, do not parallelize them — serialize within the wave or split into separate waves.
Failure recovery
If one subagent in a parallel wave fails (test failures, blocked, infinite loop):
- Do not block the wave on a failure. Cancel the failed subagent, merge the others' successful work, and re-dispatch the failed task as a single follow-up.
- If a failure exposes a bad assumption shared by multiple tasks (e.g. an event-payload schema mismatch), pause the wave and revisit the plan.
Why each wave is parallel-safe
| Wave | Tasks | Hot files touched | Disjoint? |
|---|---|---|---|
| 1 | T36, T37, T38 | new files only + chat/state/world.py (T37 only) |
✅ |
| 2 | T39, T40, T41 | new files only + chat/services/memory_write.py (T41 only) |
✅ |
| 3 | T42 | chat/web/drawer.py, chat/templates/_drawer.html |
(single task) |
| 4a | T43, T45 | chat/services/prompt.py (T43), chat/services/scene_summarize.py (T45) |
✅ |
| 4b | T44 | chat/web/turns.py, chat/services/regenerate.py |
(single task; depends on 4a) |
| 5 | T46, T47, T48 | new tests + chat/state/entities.py (T47) + docs (T48) |
✅ |
Task overview
Wave 1 ─┬─ T36: group_node schema + handler
├─ T37: guest_added / guest_removed events
└─ T38: relationship-seed service ("have they met?")
Wave 2 ─┬─ T39: interjection classifier service
├─ T40: multi-entity state-update coordinator
└─ T41: multi-witness memory write helper
Wave 3 ─── T42: drawer guest support (add/remove + render guest state)
Wave 4a ─┬─ T43: multi-entity prompt assembly (extends assemble_narrative_prompt)
└─ T45: multi-entity per-POV summaries on scene close
Wave 4b ─── T44: multi-entity turn flow integration (post_turn rewrite)
Wave 5 ─┬─ T46: witness filter test coverage (cross-witness scenarios)
├─ T47: bot reset cascades to guest scenes
└─ T48: Phase 2 documentation update
Critical path: 6 sequential merge points. Total tasks: 13. Wall-clock parallelism advantage depends on subagent dispatch overhead, but in principle Wave 1's 3 tasks can run concurrently in ~the time of one task.
Wave 1 — Foundation
These three tasks are fully independent: T36 adds new files only, T37 modifies chat/state/world.py (additive event handlers), T38 adds new files only. Dispatch all three in parallel.
Task 36: Group node schema + handler
Files:
- Create:
chat/db/migrations/0008_group_node.sql - Create:
chat/state/group_node.py - Create:
tests/test_group_node.py
Spec: Adds the group_node table (one row per chat, populated when all three entities are present in a scene) and a projector handler for the group_node_initialized event. The group node carries the shared summary, group dynamic, inside jokes, and active threads (Phase 3 will populate active_threads; for Phase 2, just summary and dynamic matter).
Step 1: Write the failing test
# tests/test_group_node.py
from chat.db.migrate import apply_migrations
from chat.db.connection import open_db
from chat.eventlog.log import append_event
from chat.eventlog.projector import project
from chat.state.group_node import get_group_node
import chat.state.entities # noqa
import chat.state.world # noqa
import chat.state.group_node # noqa: F401 - registers handlers
def test_group_node_initialized_creates_row(tmp_path):
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
# Seed bot, you, chat (minimal world state — no scene yet)
append_event(conn, kind="bot_authored", payload={
"id": "bot_a", "name": "BotA", "persona": "...",
"voice_samples": [], "traits": [], "backstory": "",
"initial_relationship_to_you": "", "kickoff_prose": "",
})
append_event(conn, kind="chat_created", payload={
"id": "chat_bot_a", "host_bot_id": "bot_a",
"initial_time": "2026-04-26T20:00:00+00:00",
"narrative_anchor": "Day 1", "weather": "",
})
append_event(conn, kind="group_node_initialized", payload={
"chat_id": "chat_bot_a",
"members": ["you", "bot_a", "bot_b"],
"summary": "",
"dynamic": "",
})
project(conn)
gn = get_group_node(conn, "chat_bot_a")
assert gn is not None
assert gn["members"] == ["you", "bot_a", "bot_b"]
assert gn["summary"] == ""
Step 2: Run test to verify it fails
.venv/bin/pytest tests/test_group_node.py -v
Expected: ModuleNotFoundError: No module named 'chat.state.group_node'.
Step 3: Write minimal implementation
chat/db/migrations/0008_group_node.sql:
CREATE TABLE group_node (
chat_id TEXT PRIMARY KEY,
members_json TEXT NOT NULL,
summary TEXT NOT NULL DEFAULT '',
dynamic TEXT NOT NULL DEFAULT '',
threads_json TEXT NOT NULL DEFAULT '[]',
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
chat/state/group_node.py:
from __future__ import annotations
import json
from sqlite3 import Connection
from chat.eventlog.projector import on
from chat.eventlog.log import Event
@on("group_node_initialized")
def _apply_group_node_initialized(conn: Connection, e: Event) -> None:
p = e.payload
conn.execute(
"INSERT OR REPLACE INTO group_node "
"(chat_id, members_json, summary, dynamic, threads_json) "
"VALUES (?, ?, ?, ?, ?)",
(
p["chat_id"],
json.dumps(p["members"]),
p.get("summary", ""),
p.get("dynamic", ""),
json.dumps(p.get("threads", [])),
),
)
@on("group_node_updated")
def _apply_group_node_updated(conn: Connection, e: Event) -> None:
"""T45 calls this on scene close to rewrite summary + dynamic."""
p = e.payload
conn.execute(
"UPDATE group_node SET summary = ?, dynamic = ?, updated_at = datetime('now') "
"WHERE chat_id = ?",
(p.get("summary", ""), p.get("dynamic", ""), p["chat_id"]),
)
def get_group_node(conn: Connection, chat_id: str) -> dict | None:
row = conn.execute(
"SELECT chat_id, members_json, summary, dynamic, threads_json, updated_at "
"FROM group_node WHERE chat_id = ?",
(chat_id,),
).fetchone()
if not row:
return None
return {
"chat_id": row[0],
"members": json.loads(row[1]),
"summary": row[2],
"dynamic": row[3],
"threads": json.loads(row[4]),
"updated_at": row[5],
}
Step 4: Run test to verify it passes
.venv/bin/pytest tests/test_group_node.py -v
Expected: 1 passed.
Step 5: Commit
git add chat/db/migrations/0008_group_node.sql chat/state/group_node.py tests/test_group_node.py
git commit -m "feat: group_node schema + projector handlers"
Notes for the implementer:
- Add a second test for
group_node_updated: append init then update, assertsummaryanddynamicchange butmembersstays. - Add a test for
get_group_nodereturningNoneon a missing chat_id. - Schema version after migration: 8. The migration runner handles this automatically; no test assertion on schema_version.
Task 37: Guest add / remove events + handlers
Files:
- Modify:
chat/state/world.py(add_apply_guest_added,_apply_guest_removedhandlers; both updatechats.guest_bot_id) - Create:
tests/test_guest_events.py
Spec: Two new event kinds.
guest_addedpayload:{chat_id, guest_bot_id}. Handler setschats.guest_bot_id = ?.guest_removedpayload:{chat_id}. Handler setschats.guest_bot_id = NULL.
These are pure state mutations — no related side effects. The kickoff parse-and-confirm flow (T13, Phase 1) and the new T39 interjection / T42 drawer routes will append these events.
Step 1: Write the failing test
# tests/test_guest_events.py
from chat.db.migrate import apply_migrations
from chat.db.connection import open_db
from chat.eventlog.log import append_event
from chat.eventlog.projector import project
from chat.state.world import get_chat
import chat.state.entities # noqa
import chat.state.world # noqa
def test_guest_added_sets_guest_bot_id(tmp_path):
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
# Seed bot, chat
append_event(conn, kind="bot_authored", payload={
"id": "bot_a", "name": "BotA", "persona": "...",
"voice_samples": [], "traits": [], "backstory": "",
"initial_relationship_to_you": "", "kickoff_prose": "",
})
append_event(conn, kind="bot_authored", payload={
"id": "bot_b", "name": "BotB", "persona": "...",
"voice_samples": [], "traits": [], "backstory": "",
"initial_relationship_to_you": "", "kickoff_prose": "",
})
append_event(conn, kind="chat_created", payload={
"id": "chat_bot_a", "host_bot_id": "bot_a",
"initial_time": "2026-04-26T20:00:00+00:00",
"narrative_anchor": "Day 1", "weather": "",
})
append_event(conn, kind="guest_added", payload={
"chat_id": "chat_bot_a", "guest_bot_id": "bot_b",
})
project(conn)
chat = get_chat(conn, "chat_bot_a")
assert chat["guest_bot_id"] == "bot_b"
def test_guest_removed_clears_guest_bot_id(tmp_path):
# similar: add then remove, assert guest_bot_id is None
...
Step 3: Implementation
In chat/state/world.py, add the two handlers next to _apply_chat_created:
@on("guest_added")
def _apply_guest_added(conn: Connection, e: Event) -> None:
p = e.payload
conn.execute(
"UPDATE chats SET guest_bot_id = ? WHERE id = ?",
(p["guest_bot_id"], p["chat_id"]),
)
@on("guest_removed")
def _apply_guest_removed(conn: Connection, e: Event) -> None:
p = e.payload
conn.execute(
"UPDATE chats SET guest_bot_id = NULL WHERE id = ?",
(p["chat_id"],),
)
Step 5: Commit
git add chat/state/world.py tests/test_guest_events.py
git commit -m "feat: guest_added / guest_removed event handlers"
Notes:
- 2 tests minimum (added, removed). Optional third: idempotent re-add (overwrites cleanly).
- Don't add any UI here — T42 handles UI.
Task 38: Relationship-seed service ("have they met?")
Files:
- Create:
chat/services/relationship_seed.py - Create:
tests/test_relationship_seed.py
Spec: Per requirements §5.2: when two bots first co-appear in a chat, prompt the user with "Have they met before? If yes, write a short prose seed describing how." The seed is parsed via classifier into structured botA ↔ botB edge content (summary + initial knowledge facts).
This task adds the service layer only. T39 (interjection) doesn't touch this; T42 (drawer guest UI) calls it via a route added there. So at the service level, we just expose:
async def seed_inter_bot_edges(
client: LLMClient,
*,
classifier_model: str,
bot_a_id: str,
bot_a_name: str,
bot_b_id: str,
bot_b_name: str,
relationship_prose: str, # user-supplied prose; empty = "they haven't met"
timeout_s: float = 30.0,
) -> RelationshipSeed:
"""Parse user-supplied prose into structured edge content for both
directed pairs (bot_a → bot_b and bot_b → bot_a). Return the
RelationshipSeed; caller is responsible for emitting two edge_update
events."""
RelationshipSeed:
class RelationshipSeed(BaseModel):
a_to_b_summary: str = ""
a_to_b_knowledge_facts: list[str] = Field(default_factory=list)
a_to_b_affinity_delta: int = 0 # signed, -10..+10 typical
a_to_b_trust_delta: int = 0
b_to_a_summary: str = ""
b_to_a_knowledge_facts: list[str] = Field(default_factory=list)
b_to_a_affinity_delta: int = 0
b_to_a_trust_delta: int = 0
If relationship_prose is empty/whitespace, short-circuit and return an empty RelationshipSeed (they haven't met → fresh edges with default 50/50).
Step 1: Failing test
import pytest, json
from chat.llm.mock import MockLLMClient
from chat.services.relationship_seed import seed_inter_bot_edges, RelationshipSeed
@pytest.mark.asyncio
async def test_seed_parses_canned_prose():
canned = json.dumps({
"a_to_b_summary": "BotA and BotB went to college together.",
"a_to_b_knowledge_facts": ["BotB has a younger brother."],
"a_to_b_affinity_delta": 5,
"a_to_b_trust_delta": 3,
"b_to_a_summary": "BotB sees BotA as the responsible one.",
"b_to_a_knowledge_facts": ["BotA was once a TA."],
"b_to_a_affinity_delta": 4,
"b_to_a_trust_delta": 5,
})
mock = MockLLMClient(canned=[canned])
seed = await seed_inter_bot_edges(
mock, classifier_model="x",
bot_a_id="bot_a", bot_a_name="BotA",
bot_b_id="bot_b", bot_b_name="BotB",
relationship_prose="They went to college together; BotB still sees BotA as the responsible one.",
)
assert "college" in seed.a_to_b_summary
assert seed.a_to_b_affinity_delta == 5
@pytest.mark.asyncio
async def test_seed_empty_prose_returns_empty():
mock = MockLLMClient(canned=[]) # never called
seed = await seed_inter_bot_edges(
mock, classifier_model="x",
bot_a_id="bot_a", bot_a_name="BotA",
bot_b_id="bot_b", bot_b_name="BotB",
relationship_prose="",
)
assert seed == RelationshipSeed()
Step 3: Minimal impl
Wraps classify() from chat.llm.classify with a RelationshipSeed schema and a system prompt explaining the task.
Step 5: Commit
git add chat/services/relationship_seed.py tests/test_relationship_seed.py
git commit -m "feat: relationship-seed service for first-co-appearance prompt"
Wave 2 — Services
After Wave 1 merges, dispatch Wave 2 in parallel. T39 and T40 are new files; T41 modifies chat/services/memory_write.py (additive — adds a new function alongside existing record_turn_memory).
Task 39: Interjection classifier service
Files:
- Create:
chat/services/interjection.py - Create:
tests/test_interjection.py
Spec: Per requirements §6.2: when a guest is present and the addressee bot has just spoken, decide whether the non-addressee bot interjects. Classifier returns {should_interject: bool, reason: str}. Caller (T44 turn flow) generates the interjection beat as a brief follow-on response if should_interject.
Public API:
class InterjectionDecision(BaseModel):
should_interject: bool = False
reason: str = ""
async def detect_interjection(
client: LLMClient,
*,
classifier_model: str,
addressee_name: str,
addressee_just_said: str,
silent_witness_name: str,
silent_witness_persona: str,
silent_witness_edge_to_addressee: dict, # {affinity, trust, summary}
silent_witness_edge_to_you: dict,
you_just_said: str,
timeout_s: float = 30.0,
) -> InterjectionDecision:
"""Decide whether the silent witness bot interjects after the addressee
finishes speaking. Conservative bias — most turns should NOT interject
(return False). Trigger only when the witness's character would
plausibly speak up: jealousy, surprise, agreement worth voicing,
correcting a falsehood, etc.
"""
Classifier system prompt should explicitly bias toward should_interject=false (per spec: "addressee gets the floor"; interjection is the exception).
Tests: 3 minimum.
- Mock returns
{should_interject: true, reason: "..."}→ result is True. - Mock returns
{should_interject: false}→ result is False. - Classifier failure → fallback default (
should_interject=false,reason="fallback").
Commit: feat: interjection classifier service
Task 40: Multi-entity state-update coordinator
Files:
- Create:
chat/services/multi_state_update.py - Create:
tests/test_multi_state_update.py
Spec: Wraps the existing chat.services.state_update.compute_state_update (single-pair) into a coordinator that runs state updates for all directed pairs of present entities. With 3 entities (you, host, guest), that's 6 pairs:
you → host, host → you
you → guest, guest → you
host → guest, guest → host
Returns a list of (source_id, target_id, StateUpdate) tuples; caller (T44) emits one edge_update event per tuple via append_and_apply.
Public API:
async def compute_state_updates_for_present(
client: LLMClient,
*,
classifier_model: str,
present_ids: list[str], # e.g. ["you", "bot_a", "bot_b"]
present_names: dict[str, str], # id -> display name
personas: dict[str, str], # id -> persona blob
prior_edges: dict[tuple[str, str], dict], # (src, tgt) -> {affinity, trust, summary}
recent_dialogue: list[dict], # [{speaker, text}, ...]
timeout_s: float = 30.0,
) -> list[tuple[str, str, StateUpdate]]:
"""Run compute_state_update for every directed pair where source != target.
Returns list of (source_id, target_id, update) tuples. Skips pairs
involving "you" with itself.
"""
Implementation: nested loops over present_ids, sequential calls to compute_state_update (parallel calls would exceed the Featherless 2-connection cap from the FeatherlessClient semaphore).
Tests: 3 minimum.
- With 2 present (you, host) → returns 2 updates (existing 1A/2D parity).
- With 3 present (you, host, guest) → returns 6 updates, one per directed non-self pair.
- Failures in one pair don't kill the whole batch (per-pair
compute_state_updatealready has a default fallback).
Commit: feat: multi-entity state-update coordinator
Task 41: Multi-witness memory write helper
Files:
- Modify:
chat/services/memory_write.py(addrecord_turn_memory_for_presentalongside existingrecord_turn_memory; do NOT remove or changerecord_turn_memory) - Add tests to:
tests/test_memory_write.py
Spec: Currently Phase 1's record_turn_memory(conn, *, chat_id, host_bot_id, narrative_text, ...) writes a single memory event for the host bot's POV. With a guest present, we need:
- One memory in the host's store (witness mask
[1, 1, 1]if you/host/guest present) - One memory in the guest's store (same witness mask, owner = guest_bot_id)
"You" still doesn't have a memory store in v1 (per §5.4 / §11.2).
New helper:
def record_turn_memory_for_present(
conn,
*,
chat_id: str,
host_bot_id: str,
guest_bot_id: str | None,
narrative_text: str,
scene_id: int | None = None,
chat_clock_at: str | None = None,
source: str = "direct",
significance: int = 1,
) -> dict[str, tuple[int, int]]:
"""Write a memory_written event for each present bot witness (host
always; guest if guest_bot_id is not None). Returns {bot_id:
(event_id, memory_id)}.
Witness mask is [1, 1, 1] when guest is present, [1, 1, 0] otherwise
(mirrors Phase 1 single-bot behavior when guest_bot_id is None).
"""
Implementation: appends one memory_written event per present bot, calling append_and_apply for each, and queries the resulting memories.id per owner+chat just like Phase 1's record_turn_memory.
Tests: 3 minimum, added to tests/test_memory_write.py:
- With
guest_bot_id=None, behaves identically torecord_turn_memory(one memory for host, witness[1, 1, 0]). - With
guest_bot_id="bot_b", writes two memories — one each for host and guest, both with witness[1, 1, 1]. - Returned dict keys match
{host_bot_id, guest_bot_id}(or just{host_bot_id}when no guest).
Commit: feat: multi-witness memory write helper
Wave 3 — Drawer guest support (single task)
This wave is one task because all Phase 2 drawer work touches the same two files (chat/web/drawer.py and chat/templates/_drawer.html). Splitting would force serial execution with conflict resolution. Single-task wave runs alone.
Task 42: Drawer guest support (add/remove + render)
Files:
- Modify:
chat/web/drawer.py(addPOST /chats/{chat_id}/drawer/guest/add,POST /chats/{chat_id}/drawer/guest/remove; extenddrawerGET handler to query guest state when present) - Modify:
chat/templates/_drawer.html(render guest activity, guest edges, group node summary; add "Add guest" form and "Remove guest" button when applicable) - Create:
tests/test_drawer_guest.py
Spec:
GET /chats/{chat_id}/drawer (extend, don't replace):
- Read
chat["guest_bot_id"]from the existingget_chatquery. - If guest present: also fetch
get_bot(conn, guest_bot_id),get_activity(conn, guest_bot_id), edges in bothhost ↔ guestdirections, edges in bothyou ↔ guestdirections, andget_group_node(conn, chat_id). - Pass all of this to the template.
Template changes:
- New section "Guest" rendering guest's name, activity, and the four edges involving the guest.
- New section "Group" rendering
group_node.summaryandgroup_node.dynamicwhen present. - "Add guest" button → expands form with: bot selector (dropdown of authored bots not currently in this chat) + relationship prose textarea (the "have they met?" prompt).
- "Remove guest" button visible when a guest is present.
POST /chats/{chat_id}/drawer/guest/add route:
- Read form:
guest_bot_id,relationship_prose. - 404 if chat or guest_bot is missing.
- 400 if guest_bot_id == host_bot_id.
- 400 if a guest is already present.
- Call
seed_inter_bot_edges(T38) with the prose. May produce empty seed if prose is blank. - Append events:
guest_added, then up to 2edge_updateevents (host ↔ guest deltas from the seed). Useappend_and_applyfor each. - If all 3 entities are now present and no
group_noderow exists for this chat, appendgroup_node_initializedwith members=[you, host, guest] and empty summary/dynamic. - Return refreshed drawer partial.
POST /chats/{chat_id}/drawer/guest/remove route:
- 404 if chat missing; 400 if no guest present.
- Append
scene_closedfor the active scene (per §7.5: removing the guest closes the current scene). - Append
guest_removed. - (Per §7.5 the host's chat then implicitly opens a new scene with you+host. For Phase 2, leave that as a manual "next user message creates the new scene" — same as Phase 1 mid-chat reset semantics. Phase 3 may auto-open.)
- Return refreshed drawer partial.
Tests (tests/test_drawer_guest.py): 6 minimum.
- GET drawer with no guest → no "Guest" section in body.
- POST add guest → 303-or-200 with refreshed drawer; chat.guest_bot_id is set;
group_noderow created; relationship-seed mock returns canned values; edges have the seeded values. - POST add guest with empty relationship_prose → guest added;
seed_inter_bot_edgesshort-circuits; edges remain at default 50/50. - POST add guest when one is already present → 400.
- POST remove guest → guest_bot_id NULL, scene_closed event written.
- GET drawer with guest present → "Guest" section + group_node summary visible.
Commit: feat: drawer guest add/remove + render
Notes for implementer:
- The guest-bot-selector dropdown lists bots from
list_bots(conn)minus the host. Don't filter for "bots not in any chat" — guests can be in multiple chats simultaneously (each chat has its own scene state). - The "have they met?" prose textarea is the per-pair prompt. v1 only fires it on first co-appearance globally; for v2, fire it every time a
(host, guest)pair has no existinghost → guestedge. After the first add, the edge exists, so subsequent adds skip the prose (or render it disabled with "you've already met"). Treat this as Phase 2.5 polish if it gets fiddly — for T42 just always show the prose textarea, blank by default. - The drawer route already uses
Depends(get_conn)and templates; reuse the existing dependency and TEMPLATES instance.
Wave 4a — Multi-entity prompt + scene close (parallel)
T43 and T45 touch different files (prompt.py and scene_summarize.py). Dispatch both in parallel.
Task 43: Multi-entity prompt assembly
Files:
- Modify:
chat/services/prompt.py(extendassemble_narrative_promptto handle aguest_idparameter and fetch guest activity, guest edge, group node into the prompt blocks) - Add tests to:
tests/test_prompt.py
Spec: The current assemble_narrative_prompt(conn, *, chat_id, speaker_bot_id, addressee="you", ...) only handles you+host. Extend:
- Accept a
guest_id: str | None = Noneparameter (auto-fetched fromchat.guest_bot_idif not passed; explicit override for tests). - When
guest_idis provided:- Activity block includes the guest's activity (
get_activity(conn, guest_id)). - If
speaker_bot_id == guest_id, the addressee defaults to "you" but caller can override. - "Speaker's other edges" SHOULD-tier block includes speaker → non-addressee (e.g., host → guest if speaker is host and addressee is you).
- MUST-tier identity block unchanged (still just speaker).
- Group-node summary becomes a SHOULD-tier block when all three are present (after MUST, before retrieved memories).
- Activity block includes the guest's activity (
- Token budget tier ordering unchanged.
Tests: 4 minimum, added to tests/test_prompt.py:
- With
guest_id=None, output matches existing 2-entity behavior (regression). - With
guest_id="bot_b"present and group_node populated, the assembled system message contains: speaker identity, guest activity, group_node summary, host→guest edge for the speaker. - Speaker is the guest (
speaker_bot_id == guest_id), addressee="you" → guest's edges and group node correctly oriented. - Tight budget forces NICE-trim of guest activity → MUST blocks (speaker identity, edge_to_addressee, last 4 turns) survive.
Commit: feat: multi-entity prompt assembly with guest activity, edges, group node
Task 45: Multi-entity per-POV summaries on scene close
Files:
- Modify:
chat/services/scene_summarize.py(extendapply_scene_close_summaryto write per-POV summaries for each present witness with a memory store, not just host) - Modify: tests in
tests/test_per_pov_summary.py
Spec: Phase 1's apply_scene_close_summary only summarizes from the host bot's POV. For Phase 2:
- Determine present witnesses with memory stores: host always; guest if
chat.guest_bot_id is not None. - For each, generate an independent per-POV summary via
summarize_scene(the existing classifier wrapper). Each call uses that bot's persona,you_name, priorbot → youedge summary, and the same dialogue. - Update each owner's memories of the closing scene with their per-POV summary.
- Update all directed bot → you edges with per-POV-derived
summarycontent. - If
group_nodeexists for this chat, also appendgroup_node_updatedevent with newsummaryanddynamicderived from the group view (runsummarize_sceneonce withbot_name="group",bot_persona="all participants"for a meta-summary). For v1 simplicity, the meta-summary can be naive concat of the host's per-POV summary + guest's per-POV summary; full LLM-merged group view is deferred to Phase 2.5.
Tests: 4 minimum, added to tests/test_per_pov_summary.py:
- With no guest, behavior matches Phase 1 (regression test).
- With guest,
apply_scene_close_summarycallssummarize_scenetwice (one per bot witness) — assert mock called 2x. - After close, each bot's memories of the closed scene have their respective per-POV summary (different text).
- With group_node present, after close
get_group_node(conn, chat_id).summaryis updated.
Commit: feat: per-POV summaries on close for each present witness
Wave 4b — Turn flow integration (single task; depends on 4a)
T44 ties everything together. It modifies chat/web/turns.py (post_turn) and chat/services/regenerate.py to use the new multi-entity primitives. Must run after Wave 4a is merged so assemble_narrative_prompt accepts guest_id and apply_scene_close_summary handles guest.
Task 44: Multi-entity turn flow
Files:
- Modify:
chat/web/turns.py(rewritepost_turnto: parse turn → optionally close scene → assemble prompt with guest → narrative stream → write memories for ALL witnesses → state updates for ALL pairs → interjection check + interjection narrative if needed) - Modify:
chat/services/regenerate.py(mirror the changes — regenerated turn rebuilds with guest in scope) - Modify: tests in
tests/test_turn_flow.py(add multi-entity scenarios)
Spec: Refactored post_turn flow:
1. Validate prose (existing 400 check).
2. Look up chat, host_bot, guest_bot (None if no guest).
3. Parse turn (existing parse_turn).
4. Append user_turn event.
5. Append assistant_turn_started.
6. Detect scene close (existing path; runs even with guest).
7. (Recent dialogue read with multi-witness in mind — same query.)
8. Determine ADDRESSEE: simplest v2 heuristic — addressee is host unless
prose explicitly names guest_bot.name. Pass to assemble_narrative_prompt.
9. Assemble narrative prompt with speaker=addressee, guest_id passed.
10. Stream narrative; broadcast tokens; commit assistant_turn (existing).
11. Write memories: record_turn_memory_for_present(host, guest).
12. State updates: compute_state_updates_for_present, then append_and_apply
one edge_update per pair.
13. INTERJECTION CHECK (only if guest present and addressee != silent witness):
a. Call detect_interjection with the silent witness as candidate.
b. If should_interject: assemble narrative prompt with speaker=silent_witness,
addressee=host (or whoever just spoke), and instruct briefly.
c. Stream second narrative; broadcast as second turn_html; commit second
assistant_turn event.
d. Run state updates + memory writes for the interjection turn too
(smaller scope — just the interjector's outgoing edges + memories).
14. Scene close summary (existing path; now multi-witness via T45).
15. Broadcast turn_html for primary + interjection (if any).
16. Return 204.
Addressee heuristic (Phase 2 v1): simple substring match on bot names. If both names appear or neither: addressee defaults to host. Phase 2.5 / Phase 3 may improve with a classifier call.
Cancel & truncated: unchanged from Phase 1 — both halves of a streaming turn (primary + interjection) cancel together.
regenerate.py changes: parallel to turns.py — multi-entity prompt assembly + multi-witness memory + multi-pair state update. Interjection regeneration is deferred to Phase 2.5 (regenerate only the addressee's turn for v2).
Tests added to tests/test_turn_flow.py: 5 minimum.
- Single-bot turn (no guest): full suite still passes (regression).
- Multi-bot turn, no interjection:
post_turnproduces 1 user_turn + 1 assistant_turn + 6 edge_updates + 2 memory_written events. Mock interjection returnsshould_interject=false. - Multi-bot turn, with interjection: produces user_turn + 2 assistant_turns + 12 edge_updates + 4 memory_written events.
- Multi-bot turn, scene close fires:
scene_closed+ multi-POV summaries written (per T45). - Addressee detection: prose
"BotB, what do you think?"routes to BotB as speaker.
Commit: feat: multi-entity turn flow with interjection support
Notes for implementer:
- This task is the largest in Phase 2 by line count. Budget for ~150-300 lines of changes across
turns.pyand tests. The implementer should split commits if it helps clarity (one commit for primary turn, one for interjection, one for tests). - Update the existing
_seed_chathelper intests/test_turn_flow.pyto optionally seed a guest, and add_seed_chat_with_guestif cleaner. - The fixture for the LLM mock now needs to provide canned responses for: parse_turn + scene_close_detect + narrative + state_updates×6 + interjection_decision + (optionally) interjection_narrative + state_updates×2 (interjection's outgoing only).
Wave 5 — Polish (parallel)
Three independent tasks. Dispatch all three in parallel after Wave 4b merges.
Task 46: Witness filter test coverage
Files:
- Create:
tests/test_witness_filter_multi.py
Spec: Phase 1 tested witness filtering with single-bot scenarios. Phase 2 needs explicit tests for the cross-witness cases:
- Memory with witness
[1, 1, 0]: visible to host, not guest (when guest queries from their POV). - Memory with witness
[0, 1, 1]: visible to host and guest, not "you". - Secondhand-source memories:
source: "told_by:bot_a", witness flag for bot_b set, reliability < 1.0.
5 tests minimum.
Commit: test: witness filter coverage for multi-entity scenarios
Task 47: Bot reset cascades to guest scenes
Files:
- Modify:
chat/state/entities.py(_apply_bot_resetextended to also remove the bot'sguest_bot_idreferences in OTHER chats:UPDATE chats SET guest_bot_id = NULL WHERE guest_bot_id = ?; remove the bot's activity row in those chats too) - Modify: tests in
tests/test_reset.py(add scenario: bot is guest in another's chat; reset clears the guest reference)
Spec: Currently bot_reset purges the bot's own chat state, memories, and edges. With Phase 2, a bot can be a guest in another bot's chat — that reference must also clear. Otherwise the host's chat sees a stale guest_bot_id pointing at a phantom bot.
Update _apply_bot_reset handler:
# After existing purges:
conn.execute("UPDATE chats SET guest_bot_id = NULL WHERE guest_bot_id = ?", (bot_id,))
conn.execute("DELETE FROM activity WHERE entity_id = ?", (bot_id,)) # already there; covers all chats
(Activity is keyed by entity_id, so the existing line handles cross-chat activity rows already.)
Tests: 2 minimum, added to tests/test_reset.py.
- BotB is guest in BotA's chat. Reset BotB. Assert
chat_bot_a.guest_bot_idis NULL. - BotB has memories (witness flag set, owner=bot_b) from being guest in BotA's chat. Reset BotB. Assert those memories are gone.
Commit: fix: bot_reset cascades to guest references in other chats
Task 48: Phase 2 documentation update
Files:
- Modify:
CLAUDE.md(add "Phase 2 status" section; update "Behavioral defaults" with multi-entity additions; add to "Phase 1.5 / 2 cleanup backlog" any v2 follow-ups discovered during execution) - Modify:
docs/plans/2026-04-26-v1-requirements-design.md(mark Phase 2 deliverables as "shipped" in the appendix decisions log)
Spec: Documentation-only task. Run last in Phase 2 so it captures any deviations from the plan that emerged during execution. Reflect:
- Multi-entity scene support (you + host + guest).
- Interjection model (default false; explicit signals only).
- Per-POV summaries on close for all witnesses with memory stores.
- Group node populated on first 3-entity scene; updated on close.
- Phase 2 known limitations:
- "Meanwhile…" (scene config 4 — bot+bot without you) deferred to Phase 3.
- Interjection regeneration deferred (regenerate only acts on the addressee turn).
- Addressee detection is a simple name-match heuristic (no classifier call yet).
Commit: docs: phase 2 status, behavioral defaults, deferred items
Wrap-up
After Wave 5 lands:
- Run full suite on
phase-2: should be ~210+ tests passing (168 from Phase 1 + ~45 new). - Manual smoke:
- Add a guest to one of the seeded bots' chats via the drawer.
- Verify "have they met?" prose seeds inter-bot edges.
- Play a few turns; verify host responds normally; verify guest occasionally interjects.
- Close the scene; check drawer for two distinct per-POV summaries.
- Remove guest mid-scene; check scene_closed fires.
- Reset a guest bot from another chat; verify guest_bot_id reference clears.
- Push
phase-2to gitea. - Open PR
phase-2 → main. - Phase 2.5 backlog candidates (track in CLAUDE.md): interjection regenerate UI, classifier-based addressee detection, group-node LLM-merged meta-summary, drawer "first-meeting" gate vs "they already know each other" toggle, witness flag editing in drawer (currently read-only by spec).
Notes for the controller running this plan
- Don't dispatch Wave 4b until Wave 4a is merged AND tested green on
phase-2. Wave 4b'sturns.pychanges import the newassemble_narrative_promptsignature from Wave 4a'sprompt.py; missing that produces import-time failures. - After each parallel wave, the controller should run a code-review subagent (
subagent-driven-developmentskill's two-stage review pattern) on each task before merging tophase-2. For purely mechanical tasks, a combined spec+quality review is acceptable. - If a parallel wave's merge produces a conflict, the wave's file-disjointness assumption was violated. Bisect the affected pair, fix the offending task in a follow-up commit on
phase-2, and proceed. - Token-spend rough estimate: Phase 2 should be ~30-40% the size of Phase 1 (smaller scope; reuses Phase 1 patterns). Per-task token spend similar to Phase 1.
- DO NOT modify Phase 1 code paths unless explicitly required (e.g., Wave 5 T47 modifies
_apply_bot_resetbecause the cascade is genuinely new behavior). The single-bot path must continue to work end-to-end after each wave.