docs: add Phase 2 implementation plan with parallel-safe waves

13 tasks across 6 waves (1, 2, 3, 4a, 4b, 5). Designed for parallel
subagent execution where file-disjointness allows.

Waves 1, 2, 4a, and 5 each contain 2-3 tasks that touch disjoint files
and can be dispatched concurrently via the Agent tool with
isolation: "worktree". Waves 3 (drawer guest support) and 4b (multi-
entity turn flow) are single-task because they touch hot files
(_drawer.html, turns.py) that cannot be safely co-modified.

Plan covers:
- T36: group_node schema + handlers (new migration 0008)
- T37: guest_added / guest_removed event handlers (modifies world.py)
- T38: relationship-seed service ("have they met?")
- T39: interjection classifier service
- T40: multi-entity state-update coordinator (6 directed pairs)
- T41: multi-witness memory write helper
- T42: drawer guest add/remove UI + render
- T43: multi-entity prompt assembly (extends T18)
- T44: multi-entity turn flow (rewrites post_turn)
- T45: multi-entity per-POV summaries on scene close
- T46: witness filter cross-coverage tests
- T47: bot_reset cascades to guest references
- T48: Phase 2 documentation update

Plan also documents:
- Worktree-per-subagent dispatch pattern using Agent isolation flag
- Merge ordering per wave (file-disjointness = conflict-free merges)
- Failure recovery (cancel failed parallel task, re-dispatch as solo)
- Conflict prevention checklist (verify Files sections disjoint per wave)

Tasks file (.tasks.json) carries dependency DAG with `blockedBy` and
`parallelGroup` so a future executing-plans run can dispatch correctly.

NOT EXECUTING. Plan only.
This commit is contained in:
Joseph Doherty
2026-04-26 15:37:07 -04:00
parent d161e7b8e9
commit b8335895e1
2 changed files with 930 additions and 0 deletions
@@ -0,0 +1,910 @@
# Roleplay Engine — Phase 2 Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Use `superpowers-extended-cc:dispatching-parallel-agents` for the parallel waves below.
**Goal:** Add multi-entity (3-entity) scene support: guest bot can be added to a host's chat; up to 3 entities present simultaneously (you + host + guest); turn flow handles silent witnesses, interjections, per-pair edges, and per-witness memory; drawer reflects guest state; scene close writes per-POV summaries for each present witness.
**Architecture:** Builds on Phase 1's event-sourced architecture. New event kinds (`guest_added`, `guest_removed`, `group_node_initialized`) carry the multi-entity state changes; existing handlers (`edge_update`, `memory_written`) already accept any `source_id`/`target_id` and witness mask, so most schema work is additive. The "have they met?" first-co-appearance prompt runs once per `(botA, botB)` pair and seeds initial inter-bot edges via existing `edge_update` events.
**Tech Stack:** Same as Phase 1 (Python 3.11+, FastAPI, HTMX, SQLite, Featherless). No new dependencies.
**Source-of-truth references:**
- Phase 2 scope: requirements doc §13 "Phase 2 — multi-entity"
- Behavioral details: requirements doc §6.2 (turn-taking with interjection), §7.5 (guest leaves), §8.5 (memory ownership), §11.2 (per-POV summaries on close)
- Conventions: [../../CLAUDE.md](../../CLAUDE.md) §"Behavioral defaults"
- Phase 1 plan (style, TDD pattern): [2026-04-26-v1-phase1-implementation.md](2026-04-26-v1-phase1-implementation.md)
When a task says "see §X", that's the requirements doc unless stated otherwise.
---
## Pre-flight
**Branch:** Create `phase-2` from the latest `main` after Phase 1 has been merged. If Phase 1 is still in PR review, branch off `phase-1` directly:
```bash
# Option A: after main has phase-1 merged
git checkout main && git pull && git checkout -b phase-2
# Option B: continue from phase-1 directly
git checkout phase-1 && git pull && git checkout -b phase-2
```
**Schema baseline:** Phase 1 leaves the DB at version 7. Phase 2 adds **0008_group_node.sql**. No other migrations expected.
**Pinned non-negotiables (carried forward from Phase 1):**
- State changes go through the event log. Use `append_and_apply(conn, kind, payload)` for the live path; `apply_event` only after a fresh `append_event` returning the new id.
- Witness filter every memory read at SQL level (hard `WHERE` constraint; never a soft signal).
- Edges are directed; `botA → botB` and `botB → botA` are independent records.
- Per-POV scene summaries — never write omniscient narration.
- TDD: every task starts with a failing test.
- One commit per task minimum, more if it splits naturally.
**Verification before claiming done:** Use `superpowers-extended-cc:verification-before-completion` — run the test command, paste actual output. Don't assume green.
---
## Parallel-Execution Strategy
This plan is structured into **6 waves** of tasks. Within a wave, tasks are designed to touch disjoint files so they can be executed by parallel subagents safely. Between waves, the controller (you, the controlling Claude session) merges each subagent's commits and verifies the suite stays green before dispatching the next wave.
### How to dispatch a wave in parallel
Use the **Agent tool with `isolation: "worktree"`** so each subagent gets its own git worktree. The runtime cleans up the worktree automatically if no changes are made; otherwise it returns the path + branch for the controller to merge.
In a single message, dispatch all tasks in the wave:
```
Agent({
description: "Wave 1 — T36 group_node schema",
subagent_type: "general-purpose",
isolation: "worktree",
prompt: "<full task text from below>",
})
Agent({
description: "Wave 1 — T37 guest events",
subagent_type: "general-purpose",
isolation: "worktree",
prompt: "<full task text from below>",
})
Agent({
description: "Wave 1 — T38 relationship-seed service",
subagent_type: "general-purpose",
isolation: "worktree",
prompt: "<full task text from below>",
})
```
All three subagents start simultaneously, each working on a private worktree branched off `phase-2`. They cannot see each other's changes (no shared filesystem state) — that's the safety guarantee.
### After a wave completes
1. Each subagent returns its worktree path and commit SHA.
2. **Run a spec + quality reviewer subagent on each completed task** (same pattern as Phase 1 — see `superpowers-extended-cc:requesting-code-review`).
3. **Merge the wave into `phase-2`** in any order (file-disjointness guarantees no conflict). Use fast-forward if possible:
```bash
git checkout phase-2
for branch in <wave-1-branches>; do
git merge --no-ff "$branch" -m "merge: <task description>"
done
```
4. **Run the full test suite** on the merged `phase-2`. If it's red, the wave's mutual independence assumption was violated — bisect to find the offending pair, fix, re-merge.
5. **Push `phase-2` to gitea** so the work is durable before the next wave starts.
6. Optionally clean up worktrees: `git worktree remove .worktrees/<branch>`.
### Conflict prevention checklist (apply before dispatch)
For each parallel wave, verify the **Files** sections of all tasks in that wave have **no overlapping paths**. The waves below are designed to satisfy this; if you decide to add or merge tasks, re-check.
If a hot file (`chat/web/turns.py`, `chat/services/prompt.py`, `chat/templates/_drawer.html`) needs changes from multiple tasks, do **not** parallelize them — serialize within the wave or split into separate waves.
### Failure recovery
If one subagent in a parallel wave fails (test failures, blocked, infinite loop):
- **Do not block the wave on a failure.** Cancel the failed subagent, merge the others' successful work, and re-dispatch the failed task as a single follow-up.
- If a failure exposes a bad assumption shared by multiple tasks (e.g. an event-payload schema mismatch), pause the wave and revisit the plan.
### Why each wave is parallel-safe
| Wave | Tasks | Hot files touched | Disjoint? |
|------|-------|-------------------|-----------|
| 1 | T36, T37, T38 | new files only + `chat/state/world.py` (T37 only) | ✅ |
| 2 | T39, T40, T41 | new files only + `chat/services/memory_write.py` (T41 only) | ✅ |
| 3 | T42 | `chat/web/drawer.py`, `chat/templates/_drawer.html` | (single task) |
| 4a | T43, T45 | `chat/services/prompt.py` (T43), `chat/services/scene_summarize.py` (T45) | ✅ |
| 4b | T44 | `chat/web/turns.py`, `chat/services/regenerate.py` | (single task; depends on 4a) |
| 5 | T46, T47, T48 | new tests + `chat/state/entities.py` (T47) + docs (T48) | ✅ |
---
## Task overview
```
Wave 1 ─┬─ T36: group_node schema + handler
├─ T37: guest_added / guest_removed events
└─ T38: relationship-seed service ("have they met?")
Wave 2 ─┬─ T39: interjection classifier service
├─ T40: multi-entity state-update coordinator
└─ T41: multi-witness memory write helper
Wave 3 ─── T42: drawer guest support (add/remove + render guest state)
Wave 4a ─┬─ T43: multi-entity prompt assembly (extends assemble_narrative_prompt)
└─ T45: multi-entity per-POV summaries on scene close
Wave 4b ─── T44: multi-entity turn flow integration (post_turn rewrite)
Wave 5 ─┬─ T46: witness filter test coverage (cross-witness scenarios)
├─ T47: bot reset cascades to guest scenes
└─ T48: Phase 2 documentation update
```
Critical path: 6 sequential merge points. Total tasks: 13. Wall-clock parallelism advantage depends on subagent dispatch overhead, but in principle Wave 1's 3 tasks can run concurrently in ~the time of one task.
---
## Wave 1 — Foundation
These three tasks are **fully independent**: T36 adds new files only, T37 modifies `chat/state/world.py` (additive event handlers), T38 adds new files only. Dispatch all three in parallel.
### Task 36: Group node schema + handler
**Files:**
- Create: `chat/db/migrations/0008_group_node.sql`
- Create: `chat/state/group_node.py`
- Create: `tests/test_group_node.py`
**Spec:** Adds the `group_node` table (one row per chat, populated when all three entities are present in a scene) and a projector handler for the `group_node_initialized` event. The group node carries the shared summary, group dynamic, inside jokes, and active threads (Phase 3 will populate `active_threads`; for Phase 2, just `summary` and `dynamic` matter).
**Step 1: Write the failing test**
```python
# tests/test_group_node.py
from chat.db.migrate import apply_migrations
from chat.db.connection import open_db
from chat.eventlog.log import append_event
from chat.eventlog.projector import project
from chat.state.group_node import get_group_node
import chat.state.entities # noqa
import chat.state.world # noqa
import chat.state.group_node # noqa: F401 - registers handlers
def test_group_node_initialized_creates_row(tmp_path):
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
# Seed bot, you, chat (minimal world state — no scene yet)
append_event(conn, kind="bot_authored", payload={
"id": "bot_a", "name": "BotA", "persona": "...",
"voice_samples": [], "traits": [], "backstory": "",
"initial_relationship_to_you": "", "kickoff_prose": "",
})
append_event(conn, kind="chat_created", payload={
"id": "chat_bot_a", "host_bot_id": "bot_a",
"initial_time": "2026-04-26T20:00:00+00:00",
"narrative_anchor": "Day 1", "weather": "",
})
append_event(conn, kind="group_node_initialized", payload={
"chat_id": "chat_bot_a",
"members": ["you", "bot_a", "bot_b"],
"summary": "",
"dynamic": "",
})
project(conn)
gn = get_group_node(conn, "chat_bot_a")
assert gn is not None
assert gn["members"] == ["you", "bot_a", "bot_b"]
assert gn["summary"] == ""
```
**Step 2: Run test to verify it fails**
```bash
.venv/bin/pytest tests/test_group_node.py -v
```
Expected: `ModuleNotFoundError: No module named 'chat.state.group_node'`.
**Step 3: Write minimal implementation**
`chat/db/migrations/0008_group_node.sql`:
```sql
CREATE TABLE group_node (
chat_id TEXT PRIMARY KEY,
members_json TEXT NOT NULL,
summary TEXT NOT NULL DEFAULT '',
dynamic TEXT NOT NULL DEFAULT '',
threads_json TEXT NOT NULL DEFAULT '[]',
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
```
`chat/state/group_node.py`:
```python
from __future__ import annotations
import json
from sqlite3 import Connection
from chat.eventlog.projector import on
from chat.eventlog.log import Event
@on("group_node_initialized")
def _apply_group_node_initialized(conn: Connection, e: Event) -> None:
p = e.payload
conn.execute(
"INSERT OR REPLACE INTO group_node "
"(chat_id, members_json, summary, dynamic, threads_json) "
"VALUES (?, ?, ?, ?, ?)",
(
p["chat_id"],
json.dumps(p["members"]),
p.get("summary", ""),
p.get("dynamic", ""),
json.dumps(p.get("threads", [])),
),
)
@on("group_node_updated")
def _apply_group_node_updated(conn: Connection, e: Event) -> None:
"""T45 calls this on scene close to rewrite summary + dynamic."""
p = e.payload
conn.execute(
"UPDATE group_node SET summary = ?, dynamic = ?, updated_at = datetime('now') "
"WHERE chat_id = ?",
(p.get("summary", ""), p.get("dynamic", ""), p["chat_id"]),
)
def get_group_node(conn: Connection, chat_id: str) -> dict | None:
row = conn.execute(
"SELECT chat_id, members_json, summary, dynamic, threads_json, updated_at "
"FROM group_node WHERE chat_id = ?",
(chat_id,),
).fetchone()
if not row:
return None
return {
"chat_id": row[0],
"members": json.loads(row[1]),
"summary": row[2],
"dynamic": row[3],
"threads": json.loads(row[4]),
"updated_at": row[5],
}
```
**Step 4: Run test to verify it passes**
```bash
.venv/bin/pytest tests/test_group_node.py -v
```
Expected: 1 passed.
**Step 5: Commit**
```bash
git add chat/db/migrations/0008_group_node.sql chat/state/group_node.py tests/test_group_node.py
git commit -m "feat: group_node schema + projector handlers"
```
**Notes for the implementer:**
- Add a second test for `group_node_updated`: append init then update, assert `summary` and `dynamic` change but `members` stays.
- Add a test for `get_group_node` returning `None` on a missing chat_id.
- Schema version after migration: 8. The migration runner handles this automatically; no test assertion on schema_version.
---
### Task 37: Guest add / remove events + handlers
**Files:**
- Modify: `chat/state/world.py` (add `_apply_guest_added`, `_apply_guest_removed` handlers; both update `chats.guest_bot_id`)
- Create: `tests/test_guest_events.py`
**Spec:** Two new event kinds.
- `guest_added` payload: `{chat_id, guest_bot_id}`. Handler sets `chats.guest_bot_id = ?`.
- `guest_removed` payload: `{chat_id}`. Handler sets `chats.guest_bot_id = NULL`.
These are pure state mutations — no related side effects. The kickoff parse-and-confirm flow (T13, Phase 1) and the new T39 interjection / T42 drawer routes will append these events.
**Step 1: Write the failing test**
```python
# tests/test_guest_events.py
from chat.db.migrate import apply_migrations
from chat.db.connection import open_db
from chat.eventlog.log import append_event
from chat.eventlog.projector import project
from chat.state.world import get_chat
import chat.state.entities # noqa
import chat.state.world # noqa
def test_guest_added_sets_guest_bot_id(tmp_path):
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
# Seed bot, chat
append_event(conn, kind="bot_authored", payload={
"id": "bot_a", "name": "BotA", "persona": "...",
"voice_samples": [], "traits": [], "backstory": "",
"initial_relationship_to_you": "", "kickoff_prose": "",
})
append_event(conn, kind="bot_authored", payload={
"id": "bot_b", "name": "BotB", "persona": "...",
"voice_samples": [], "traits": [], "backstory": "",
"initial_relationship_to_you": "", "kickoff_prose": "",
})
append_event(conn, kind="chat_created", payload={
"id": "chat_bot_a", "host_bot_id": "bot_a",
"initial_time": "2026-04-26T20:00:00+00:00",
"narrative_anchor": "Day 1", "weather": "",
})
append_event(conn, kind="guest_added", payload={
"chat_id": "chat_bot_a", "guest_bot_id": "bot_b",
})
project(conn)
chat = get_chat(conn, "chat_bot_a")
assert chat["guest_bot_id"] == "bot_b"
def test_guest_removed_clears_guest_bot_id(tmp_path):
# similar: add then remove, assert guest_bot_id is None
...
```
**Step 3: Implementation**
In `chat/state/world.py`, add the two handlers next to `_apply_chat_created`:
```python
@on("guest_added")
def _apply_guest_added(conn: Connection, e: Event) -> None:
p = e.payload
conn.execute(
"UPDATE chats SET guest_bot_id = ? WHERE id = ?",
(p["guest_bot_id"], p["chat_id"]),
)
@on("guest_removed")
def _apply_guest_removed(conn: Connection, e: Event) -> None:
p = e.payload
conn.execute(
"UPDATE chats SET guest_bot_id = NULL WHERE id = ?",
(p["chat_id"],),
)
```
**Step 5: Commit**
```bash
git add chat/state/world.py tests/test_guest_events.py
git commit -m "feat: guest_added / guest_removed event handlers"
```
**Notes:**
- 2 tests minimum (added, removed). Optional third: idempotent re-add (overwrites cleanly).
- Don't add any UI here — T42 handles UI.
---
### Task 38: Relationship-seed service ("have they met?")
**Files:**
- Create: `chat/services/relationship_seed.py`
- Create: `tests/test_relationship_seed.py`
**Spec:** Per requirements §5.2: when two bots first co-appear in a chat, prompt the user with "Have they met before? If yes, write a short prose seed describing how." The seed is parsed via classifier into structured `botA ↔ botB` edge content (summary + initial knowledge facts).
This task adds the **service layer** only. T39 (interjection) doesn't touch this; T42 (drawer guest UI) calls it via a route added there. So at the service level, we just expose:
```python
async def seed_inter_bot_edges(
client: LLMClient,
*,
classifier_model: str,
bot_a_id: str,
bot_a_name: str,
bot_b_id: str,
bot_b_name: str,
relationship_prose: str, # user-supplied prose; empty = "they haven't met"
timeout_s: float = 30.0,
) -> RelationshipSeed:
"""Parse user-supplied prose into structured edge content for both
directed pairs (bot_a → bot_b and bot_b → bot_a). Return the
RelationshipSeed; caller is responsible for emitting two edge_update
events."""
```
`RelationshipSeed`:
```python
class RelationshipSeed(BaseModel):
a_to_b_summary: str = ""
a_to_b_knowledge_facts: list[str] = Field(default_factory=list)
a_to_b_affinity_delta: int = 0 # signed, -10..+10 typical
a_to_b_trust_delta: int = 0
b_to_a_summary: str = ""
b_to_a_knowledge_facts: list[str] = Field(default_factory=list)
b_to_a_affinity_delta: int = 0
b_to_a_trust_delta: int = 0
```
If `relationship_prose` is empty/whitespace, short-circuit and return an empty `RelationshipSeed` (they haven't met → fresh edges with default 50/50).
**Step 1: Failing test**
```python
import pytest, json
from chat.llm.mock import MockLLMClient
from chat.services.relationship_seed import seed_inter_bot_edges, RelationshipSeed
@pytest.mark.asyncio
async def test_seed_parses_canned_prose():
canned = json.dumps({
"a_to_b_summary": "BotA and BotB went to college together.",
"a_to_b_knowledge_facts": ["BotB has a younger brother."],
"a_to_b_affinity_delta": 5,
"a_to_b_trust_delta": 3,
"b_to_a_summary": "BotB sees BotA as the responsible one.",
"b_to_a_knowledge_facts": ["BotA was once a TA."],
"b_to_a_affinity_delta": 4,
"b_to_a_trust_delta": 5,
})
mock = MockLLMClient(canned=[canned])
seed = await seed_inter_bot_edges(
mock, classifier_model="x",
bot_a_id="bot_a", bot_a_name="BotA",
bot_b_id="bot_b", bot_b_name="BotB",
relationship_prose="They went to college together; BotB still sees BotA as the responsible one.",
)
assert "college" in seed.a_to_b_summary
assert seed.a_to_b_affinity_delta == 5
@pytest.mark.asyncio
async def test_seed_empty_prose_returns_empty():
mock = MockLLMClient(canned=[]) # never called
seed = await seed_inter_bot_edges(
mock, classifier_model="x",
bot_a_id="bot_a", bot_a_name="BotA",
bot_b_id="bot_b", bot_b_name="BotB",
relationship_prose="",
)
assert seed == RelationshipSeed()
```
**Step 3: Minimal impl**
Wraps `classify()` from `chat.llm.classify` with a `RelationshipSeed` schema and a system prompt explaining the task.
**Step 5: Commit**
```bash
git add chat/services/relationship_seed.py tests/test_relationship_seed.py
git commit -m "feat: relationship-seed service for first-co-appearance prompt"
```
---
## Wave 2 — Services
After Wave 1 merges, dispatch Wave 2 in parallel. T39 and T40 are new files; T41 modifies `chat/services/memory_write.py` (additive — adds a new function alongside existing `record_turn_memory`).
### Task 39: Interjection classifier service
**Files:**
- Create: `chat/services/interjection.py`
- Create: `tests/test_interjection.py`
**Spec:** Per requirements §6.2: when a guest is present and the addressee bot has just spoken, decide whether the *non-addressee* bot interjects. Classifier returns `{should_interject: bool, reason: str}`. Caller (T44 turn flow) generates the interjection beat as a brief follow-on response if `should_interject`.
**Public API:**
```python
class InterjectionDecision(BaseModel):
should_interject: bool = False
reason: str = ""
async def detect_interjection(
client: LLMClient,
*,
classifier_model: str,
addressee_name: str,
addressee_just_said: str,
silent_witness_name: str,
silent_witness_persona: str,
silent_witness_edge_to_addressee: dict, # {affinity, trust, summary}
silent_witness_edge_to_you: dict,
you_just_said: str,
timeout_s: float = 30.0,
) -> InterjectionDecision:
"""Decide whether the silent witness bot interjects after the addressee
finishes speaking. Conservative bias — most turns should NOT interject
(return False). Trigger only when the witness's character would
plausibly speak up: jealousy, surprise, agreement worth voicing,
correcting a falsehood, etc.
"""
```
Classifier system prompt should explicitly bias toward `should_interject=false` (per spec: "addressee gets the floor"; interjection is the exception).
**Tests:** 3 minimum.
1. Mock returns `{should_interject: true, reason: "..."}` → result is True.
2. Mock returns `{should_interject: false}` → result is False.
3. Classifier failure → fallback default (`should_interject=false`, `reason="fallback"`).
**Commit:** `feat: interjection classifier service`
---
### Task 40: Multi-entity state-update coordinator
**Files:**
- Create: `chat/services/multi_state_update.py`
- Create: `tests/test_multi_state_update.py`
**Spec:** Wraps the existing `chat.services.state_update.compute_state_update` (single-pair) into a coordinator that runs state updates for **all directed pairs of present entities**. With 3 entities (you, host, guest), that's 6 pairs:
```
you → host, host → you
you → guest, guest → you
host → guest, guest → host
```
Returns a list of `(source_id, target_id, StateUpdate)` tuples; caller (T44) emits one `edge_update` event per tuple via `append_and_apply`.
**Public API:**
```python
async def compute_state_updates_for_present(
client: LLMClient,
*,
classifier_model: str,
present_ids: list[str], # e.g. ["you", "bot_a", "bot_b"]
present_names: dict[str, str], # id -> display name
personas: dict[str, str], # id -> persona blob
prior_edges: dict[tuple[str, str], dict], # (src, tgt) -> {affinity, trust, summary}
recent_dialogue: list[dict], # [{speaker, text}, ...]
timeout_s: float = 30.0,
) -> list[tuple[str, str, StateUpdate]]:
"""Run compute_state_update for every directed pair where source != target.
Returns list of (source_id, target_id, update) tuples. Skips pairs
involving "you" with itself.
"""
```
Implementation: nested loops over `present_ids`, sequential calls to `compute_state_update` (parallel calls would exceed the Featherless 2-connection cap from the FeatherlessClient semaphore).
**Tests:** 3 minimum.
1. With 2 present (you, host) → returns 2 updates (existing 1A/2D parity).
2. With 3 present (you, host, guest) → returns 6 updates, one per directed non-self pair.
3. Failures in one pair don't kill the whole batch (per-pair `compute_state_update` already has a default fallback).
**Commit:** `feat: multi-entity state-update coordinator`
---
### Task 41: Multi-witness memory write helper
**Files:**
- Modify: `chat/services/memory_write.py` (add `record_turn_memory_for_present` alongside existing `record_turn_memory`; do NOT remove or change `record_turn_memory`)
- Add tests to: `tests/test_memory_write.py`
**Spec:** Currently Phase 1's `record_turn_memory(conn, *, chat_id, host_bot_id, narrative_text, ...)` writes a single memory event for the host bot's POV. With a guest present, we need:
- One memory in the host's store (witness mask `[1, 1, 1]` if you/host/guest present)
- One memory in the guest's store (same witness mask, owner = guest_bot_id)
"You" still doesn't have a memory store in v1 (per §5.4 / §11.2).
**New helper:**
```python
def record_turn_memory_for_present(
conn,
*,
chat_id: str,
host_bot_id: str,
guest_bot_id: str | None,
narrative_text: str,
scene_id: int | None = None,
chat_clock_at: str | None = None,
source: str = "direct",
significance: int = 1,
) -> dict[str, tuple[int, int]]:
"""Write a memory_written event for each present bot witness (host
always; guest if guest_bot_id is not None). Returns {bot_id:
(event_id, memory_id)}.
Witness mask is [1, 1, 1] when guest is present, [1, 1, 0] otherwise
(mirrors Phase 1 single-bot behavior when guest_bot_id is None).
"""
```
Implementation: appends one `memory_written` event per present bot, calling `append_and_apply` for each, and queries the resulting `memories.id` per owner+chat just like Phase 1's `record_turn_memory`.
**Tests:** 3 minimum, added to `tests/test_memory_write.py`:
1. With `guest_bot_id=None`, behaves identically to `record_turn_memory` (one memory for host, witness `[1, 1, 0]`).
2. With `guest_bot_id="bot_b"`, writes two memories — one each for host and guest, both with witness `[1, 1, 1]`.
3. Returned dict keys match `{host_bot_id, guest_bot_id}` (or just `{host_bot_id}` when no guest).
**Commit:** `feat: multi-witness memory write helper`
---
## Wave 3 — Drawer guest support (single task)
This wave is one task because all Phase 2 drawer work touches the same two files (`chat/web/drawer.py` and `chat/templates/_drawer.html`). Splitting would force serial execution with conflict resolution. Single-task wave runs alone.
### Task 42: Drawer guest support (add/remove + render)
**Files:**
- Modify: `chat/web/drawer.py` (add `POST /chats/{chat_id}/drawer/guest/add`, `POST /chats/{chat_id}/drawer/guest/remove`; extend `drawer` GET handler to query guest state when present)
- Modify: `chat/templates/_drawer.html` (render guest activity, guest edges, group node summary; add "Add guest" form and "Remove guest" button when applicable)
- Create: `tests/test_drawer_guest.py`
**Spec:**
**GET /chats/{chat_id}/drawer** (extend, don't replace):
- Read `chat["guest_bot_id"]` from the existing `get_chat` query.
- If guest present: also fetch `get_bot(conn, guest_bot_id)`, `get_activity(conn, guest_bot_id)`, edges in both `host ↔ guest` directions, edges in both `you ↔ guest` directions, and `get_group_node(conn, chat_id)`.
- Pass all of this to the template.
**Template changes:**
- New section "Guest" rendering guest's name, activity, and the four edges involving the guest.
- New section "Group" rendering `group_node.summary` and `group_node.dynamic` when present.
- "Add guest" button → expands form with: bot selector (dropdown of authored bots not currently in this chat) + relationship prose textarea (the "have they met?" prompt).
- "Remove guest" button visible when a guest is present.
**POST /chats/{chat_id}/drawer/guest/add** route:
1. Read form: `guest_bot_id`, `relationship_prose`.
2. 404 if chat or guest_bot is missing.
3. 400 if guest_bot_id == host_bot_id.
4. 400 if a guest is already present.
5. Call `seed_inter_bot_edges` (T38) with the prose. May produce empty seed if prose is blank.
6. Append events: `guest_added`, then up to 2 `edge_update` events (host ↔ guest deltas from the seed). Use `append_and_apply` for each.
7. If all 3 entities are now present and no `group_node` row exists for this chat, append `group_node_initialized` with members=[you, host, guest] and empty summary/dynamic.
8. Return refreshed drawer partial.
**POST /chats/{chat_id}/drawer/guest/remove** route:
1. 404 if chat missing; 400 if no guest present.
2. Append `scene_closed` for the active scene (per §7.5: removing the guest closes the current scene).
3. Append `guest_removed`.
4. (Per §7.5 the host's chat then implicitly opens a new scene with you+host. For Phase 2, leave that as a manual "next user message creates the new scene" — same as Phase 1 mid-chat reset semantics. Phase 3 may auto-open.)
5. Return refreshed drawer partial.
**Tests (`tests/test_drawer_guest.py`):** 6 minimum.
1. GET drawer with no guest → no "Guest" section in body.
2. POST add guest → 303-or-200 with refreshed drawer; chat.guest_bot_id is set; `group_node` row created; relationship-seed mock returns canned values; edges have the seeded values.
3. POST add guest with empty relationship_prose → guest added; `seed_inter_bot_edges` short-circuits; edges remain at default 50/50.
4. POST add guest when one is already present → 400.
5. POST remove guest → guest_bot_id NULL, scene_closed event written.
6. GET drawer with guest present → "Guest" section + group_node summary visible.
**Commit:** `feat: drawer guest add/remove + render`
**Notes for implementer:**
- The guest-bot-selector dropdown lists bots from `list_bots(conn)` minus the host. Don't filter for "bots not in any chat" — guests can be in multiple chats simultaneously (each chat has its own scene state).
- The "have they met?" prose textarea is the per-pair prompt. v1 only fires it on first co-appearance globally; for v2, fire it every time a `(host, guest)` pair has no existing `host → guest` edge. After the first add, the edge exists, so subsequent adds skip the prose (or render it disabled with "you've already met"). Treat this as Phase 2.5 polish if it gets fiddly — for T42 just always show the prose textarea, blank by default.
- The drawer route already uses `Depends(get_conn)` and templates; reuse the existing dependency and TEMPLATES instance.
---
## Wave 4a — Multi-entity prompt + scene close (parallel)
T43 and T45 touch different files (`prompt.py` and `scene_summarize.py`). Dispatch both in parallel.
### Task 43: Multi-entity prompt assembly
**Files:**
- Modify: `chat/services/prompt.py` (extend `assemble_narrative_prompt` to handle a `guest_id` parameter and fetch guest activity, guest edge, group node into the prompt blocks)
- Add tests to: `tests/test_prompt.py`
**Spec:** The current `assemble_narrative_prompt(conn, *, chat_id, speaker_bot_id, addressee="you", ...)` only handles you+host. Extend:
- Accept a `guest_id: str | None = None` parameter (auto-fetched from `chat.guest_bot_id` if not passed; explicit override for tests).
- When `guest_id` is provided:
- Activity block includes the guest's activity (`get_activity(conn, guest_id)`).
- If `speaker_bot_id == guest_id`, the addressee defaults to "you" but caller can override.
- "Speaker's other edges" SHOULD-tier block includes speaker → non-addressee (e.g., host → guest if speaker is host and addressee is you).
- MUST-tier identity block unchanged (still just speaker).
- Group-node summary becomes a SHOULD-tier block when all three are present (after MUST, before retrieved memories).
- Token budget tier ordering unchanged.
**Tests:** 4 minimum, added to `tests/test_prompt.py`:
1. With `guest_id=None`, output matches existing 2-entity behavior (regression).
2. With `guest_id="bot_b"` present and group_node populated, the assembled system message contains: speaker identity, guest activity, group_node summary, host→guest edge for the speaker.
3. Speaker is the guest (`speaker_bot_id == guest_id`), addressee="you" → guest's edges and group node correctly oriented.
4. Tight budget forces NICE-trim of guest activity → MUST blocks (speaker identity, edge_to_addressee, last 4 turns) survive.
**Commit:** `feat: multi-entity prompt assembly with guest activity, edges, group node`
---
### Task 45: Multi-entity per-POV summaries on scene close
**Files:**
- Modify: `chat/services/scene_summarize.py` (extend `apply_scene_close_summary` to write per-POV summaries for **each present witness** with a memory store, not just host)
- Modify: tests in `tests/test_per_pov_summary.py`
**Spec:** Phase 1's `apply_scene_close_summary` only summarizes from the host bot's POV. For Phase 2:
- Determine present witnesses with memory stores: host always; guest if `chat.guest_bot_id is not None`.
- For each, generate an independent per-POV summary via `summarize_scene` (the existing classifier wrapper). Each call uses **that bot's** persona, `you_name`, prior `bot → you` edge summary, and the same dialogue.
- Update each owner's memories of the closing scene with their per-POV summary.
- Update **all directed bot → you edges** with per-POV-derived `summary` content.
- If `group_node` exists for this chat, also append `group_node_updated` event with new `summary` and `dynamic` derived from the group view (run `summarize_scene` once with `bot_name="group"`, `bot_persona="all participants"` for a meta-summary). For v1 simplicity, the meta-summary can be naive concat of the host's per-POV summary + guest's per-POV summary; full LLM-merged group view is deferred to Phase 2.5.
**Tests:** 4 minimum, added to `tests/test_per_pov_summary.py`:
1. With no guest, behavior matches Phase 1 (regression test).
2. With guest, `apply_scene_close_summary` calls `summarize_scene` twice (one per bot witness) — assert mock called 2x.
3. After close, each bot's memories of the closed scene have their respective per-POV summary (different text).
4. With group_node present, after close `get_group_node(conn, chat_id).summary` is updated.
**Commit:** `feat: per-POV summaries on close for each present witness`
---
## Wave 4b — Turn flow integration (single task; depends on 4a)
T44 ties everything together. It modifies `chat/web/turns.py` (post_turn) and `chat/services/regenerate.py` to use the new multi-entity primitives. Must run after Wave 4a is merged so `assemble_narrative_prompt` accepts `guest_id` and `apply_scene_close_summary` handles guest.
### Task 44: Multi-entity turn flow
**Files:**
- Modify: `chat/web/turns.py` (rewrite `post_turn` to: parse turn → optionally close scene → assemble prompt with guest → narrative stream → write memories for ALL witnesses → state updates for ALL pairs → interjection check + interjection narrative if needed)
- Modify: `chat/services/regenerate.py` (mirror the changes — regenerated turn rebuilds with guest in scope)
- Modify: tests in `tests/test_turn_flow.py` (add multi-entity scenarios)
**Spec:** Refactored `post_turn` flow:
```
1. Validate prose (existing 400 check).
2. Look up chat, host_bot, guest_bot (None if no guest).
3. Parse turn (existing parse_turn).
4. Append user_turn event.
5. Append assistant_turn_started.
6. Detect scene close (existing path; runs even with guest).
7. (Recent dialogue read with multi-witness in mind — same query.)
8. Determine ADDRESSEE: simplest v2 heuristic — addressee is host unless
prose explicitly names guest_bot.name. Pass to assemble_narrative_prompt.
9. Assemble narrative prompt with speaker=addressee, guest_id passed.
10. Stream narrative; broadcast tokens; commit assistant_turn (existing).
11. Write memories: record_turn_memory_for_present(host, guest).
12. State updates: compute_state_updates_for_present, then append_and_apply
one edge_update per pair.
13. INTERJECTION CHECK (only if guest present and addressee != silent witness):
a. Call detect_interjection with the silent witness as candidate.
b. If should_interject: assemble narrative prompt with speaker=silent_witness,
addressee=host (or whoever just spoke), and instruct briefly.
c. Stream second narrative; broadcast as second turn_html; commit second
assistant_turn event.
d. Run state updates + memory writes for the interjection turn too
(smaller scope — just the interjector's outgoing edges + memories).
14. Scene close summary (existing path; now multi-witness via T45).
15. Broadcast turn_html for primary + interjection (if any).
16. Return 204.
```
**Addressee heuristic (Phase 2 v1):** simple substring match on bot names. If both names appear or neither: addressee defaults to host. Phase 2.5 / Phase 3 may improve with a classifier call.
**Cancel & truncated:** unchanged from Phase 1 — both halves of a streaming turn (primary + interjection) cancel together.
**`regenerate.py` changes:** parallel to `turns.py` — multi-entity prompt assembly + multi-witness memory + multi-pair state update. Interjection regeneration is deferred to Phase 2.5 (regenerate only the addressee's turn for v2).
**Tests added to `tests/test_turn_flow.py`:** 5 minimum.
1. Single-bot turn (no guest): full suite still passes (regression).
2. Multi-bot turn, no interjection: `post_turn` produces 1 user_turn + 1 assistant_turn + 6 edge_updates + 2 memory_written events. Mock interjection returns `should_interject=false`.
3. Multi-bot turn, with interjection: produces user_turn + 2 assistant_turns + 12 edge_updates + 4 memory_written events.
4. Multi-bot turn, scene close fires: `scene_closed` + multi-POV summaries written (per T45).
5. Addressee detection: prose `"BotB, what do you think?"` routes to BotB as speaker.
**Commit:** `feat: multi-entity turn flow with interjection support`
**Notes for implementer:**
- This task is the largest in Phase 2 by line count. Budget for ~150-300 lines of changes across `turns.py` and tests. The implementer should split commits if it helps clarity (one commit for primary turn, one for interjection, one for tests).
- Update the existing `_seed_chat` helper in `tests/test_turn_flow.py` to optionally seed a guest, and add `_seed_chat_with_guest` if cleaner.
- The fixture for the LLM mock now needs to provide canned responses for: parse_turn + scene_close_detect + narrative + state_updates×6 + interjection_decision + (optionally) interjection_narrative + state_updates×2 (interjection's outgoing only).
---
## Wave 5 — Polish (parallel)
Three independent tasks. Dispatch all three in parallel after Wave 4b merges.
### Task 46: Witness filter test coverage
**Files:**
- Create: `tests/test_witness_filter_multi.py`
**Spec:** Phase 1 tested witness filtering with single-bot scenarios. Phase 2 needs explicit tests for the cross-witness cases:
- Memory with witness `[1, 1, 0]`: visible to host, not guest (when guest queries from their POV).
- Memory with witness `[0, 1, 1]`: visible to host and guest, not "you".
- Secondhand-source memories: `source: "told_by:bot_a"`, witness flag for bot_b set, reliability < 1.0.
5 tests minimum.
**Commit:** `test: witness filter coverage for multi-entity scenarios`
---
### Task 47: Bot reset cascades to guest scenes
**Files:**
- Modify: `chat/state/entities.py` (`_apply_bot_reset` extended to also remove the bot's `guest_bot_id` references in OTHER chats: `UPDATE chats SET guest_bot_id = NULL WHERE guest_bot_id = ?`; remove the bot's activity row in those chats too)
- Modify: tests in `tests/test_reset.py` (add scenario: bot is guest in another's chat; reset clears the guest reference)
**Spec:** Currently `bot_reset` purges the bot's own chat state, memories, and edges. With Phase 2, a bot can be a guest in another bot's chat — that reference must also clear. Otherwise the host's chat sees a stale guest_bot_id pointing at a phantom bot.
Update `_apply_bot_reset` handler:
```python
# After existing purges:
conn.execute("UPDATE chats SET guest_bot_id = NULL WHERE guest_bot_id = ?", (bot_id,))
conn.execute("DELETE FROM activity WHERE entity_id = ?", (bot_id,)) # already there; covers all chats
```
(Activity is keyed by entity_id, so the existing line handles cross-chat activity rows already.)
**Tests:** 2 minimum, added to `tests/test_reset.py`.
1. BotB is guest in BotA's chat. Reset BotB. Assert `chat_bot_a.guest_bot_id` is NULL.
2. BotB has memories (witness flag set, owner=bot_b) from being guest in BotA's chat. Reset BotB. Assert those memories are gone.
**Commit:** `fix: bot_reset cascades to guest references in other chats`
---
### Task 48: Phase 2 documentation update
**Files:**
- Modify: `CLAUDE.md` (add "Phase 2 status" section; update "Behavioral defaults" with multi-entity additions; add to "Phase 1.5 / 2 cleanup backlog" any v2 follow-ups discovered during execution)
- Modify: `docs/plans/2026-04-26-v1-requirements-design.md` (mark Phase 2 deliverables as "shipped" in the appendix decisions log)
**Spec:** Documentation-only task. Run last in Phase 2 so it captures any deviations from the plan that emerged during execution. Reflect:
- Multi-entity scene support (you + host + guest).
- Interjection model (default false; explicit signals only).
- Per-POV summaries on close for all witnesses with memory stores.
- Group node populated on first 3-entity scene; updated on close.
- Phase 2 known limitations:
- "Meanwhile…" (scene config 4 — bot+bot without you) deferred to Phase 3.
- Interjection regeneration deferred (regenerate only acts on the addressee turn).
- Addressee detection is a simple name-match heuristic (no classifier call yet).
**Commit:** `docs: phase 2 status, behavioral defaults, deferred items`
---
## Wrap-up
After Wave 5 lands:
1. **Run full suite** on `phase-2`: should be ~210+ tests passing (168 from Phase 1 + ~45 new).
2. **Manual smoke**:
- Add a guest to one of the seeded bots' chats via the drawer.
- Verify "have they met?" prose seeds inter-bot edges.
- Play a few turns; verify host responds normally; verify guest occasionally interjects.
- Close the scene; check drawer for two distinct per-POV summaries.
- Remove guest mid-scene; check scene_closed fires.
- Reset a guest bot from another chat; verify guest_bot_id reference clears.
3. **Push `phase-2`** to gitea.
4. **Open PR** `phase-2 → main`.
5. **Phase 2.5 backlog candidates** (track in CLAUDE.md): interjection regenerate UI, classifier-based addressee detection, group-node LLM-merged meta-summary, drawer "first-meeting" gate vs "they already know each other" toggle, witness flag editing in drawer (currently read-only by spec).
---
## Notes for the controller running this plan
- **Don't dispatch Wave 4b until Wave 4a is merged AND tested green on `phase-2`.** Wave 4b's `turns.py` changes import the new `assemble_narrative_prompt` signature from Wave 4a's `prompt.py`; missing that produces import-time failures.
- **After each parallel wave**, the controller should run a code-review subagent (`subagent-driven-development` skill's two-stage review pattern) on each task before merging to `phase-2`. For purely mechanical tasks, a combined spec+quality review is acceptable.
- **If a parallel wave's merge produces a conflict**, the wave's file-disjointness assumption was violated. Bisect the affected pair, fix the offending task in a follow-up commit on `phase-2`, and proceed.
- **Token-spend rough estimate**: Phase 2 should be ~30-40% the size of Phase 1 (smaller scope; reuses Phase 1 patterns). Per-task token spend similar to Phase 1.
- **DO NOT modify Phase 1 code paths** unless explicitly required (e.g., Wave 5 T47 modifies `_apply_bot_reset` because the cascade is genuinely new behavior). The single-bot path must continue to work end-to-end after each wave.
@@ -0,0 +1,20 @@
{
"planPath": "docs/plans/2026-04-26-v2-phase2-implementation.md",
"tasks": [
{"id": 36, "subject": "T36: group_node schema + projector handlers", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 37, "subject": "T37: guest_added / guest_removed event handlers", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 38, "subject": "T38: relationship-seed service for first-co-appearance prompt", "status": "pending", "wave": 1, "parallelGroup": "wave-1"},
{"id": 39, "subject": "T39: interjection classifier service", "status": "pending", "wave": 2, "parallelGroup": "wave-2", "blockedBy": [37]},
{"id": 40, "subject": "T40: multi-entity state-update coordinator", "status": "pending", "wave": 2, "parallelGroup": "wave-2", "blockedBy": [37]},
{"id": 41, "subject": "T41: multi-witness memory write helper", "status": "pending", "wave": 2, "parallelGroup": "wave-2", "blockedBy": [37]},
{"id": 42, "subject": "T42: drawer guest add/remove + render", "status": "pending", "wave": 3, "parallelGroup": null, "blockedBy": [36, 37, 38]},
{"id": 43, "subject": "T43: multi-entity prompt assembly with guest activity, edges, group node", "status": "pending", "wave": 4, "parallelGroup": "wave-4a", "blockedBy": [36, 37]},
{"id": 45, "subject": "T45: per-POV summaries on close for each present witness", "status": "pending", "wave": 4, "parallelGroup": "wave-4a", "blockedBy": [36, 37]},
{"id": 44, "subject": "T44: multi-entity turn flow with interjection support", "status": "pending", "wave": 4, "parallelGroup": null, "blockedBy": [39, 40, 41, 43, 45]},
{"id": 46, "subject": "T46: witness filter test coverage for multi-entity scenarios", "status": "pending", "wave": 5, "parallelGroup": "wave-5", "blockedBy": [44]},
{"id": 47, "subject": "T47: bot_reset cascades to guest references in other chats", "status": "pending", "wave": 5, "parallelGroup": "wave-5", "blockedBy": [37]},
{"id": 48, "subject": "T48: Phase 2 documentation update", "status": "pending", "wave": 5, "parallelGroup": "wave-5", "blockedBy": [44]}
],
"lastUpdated": "2026-04-26T00:00:00Z",
"notes": "13 tasks across 6 waves (1, 2, 3, 4a, 4b, 5). Waves 1, 2, 4a, 5 are parallel-safe (file-disjoint within each). Waves 3 and 4b are single-task. Use Agent tool with isolation: 'worktree' to dispatch parallel tasks. Merge each wave's worktrees back into phase-2 before dispatching the next wave. See plan §Parallel-Execution Strategy for full guidance."
}