Files

T

Joseph Doherty b8335895e1 docs: add Phase 2 implementation plan with parallel-safe waves

13 tasks across 6 waves (1, 2, 3, 4a, 4b, 5). Designed for parallel
subagent execution where file-disjointness allows.

Waves 1, 2, 4a, and 5 each contain 2-3 tasks that touch disjoint files
and can be dispatched concurrently via the Agent tool with
isolation: "worktree". Waves 3 (drawer guest support) and 4b (multi-
entity turn flow) are single-task because they touch hot files
(_drawer.html, turns.py) that cannot be safely co-modified.

Plan covers:
- T36: group_node schema + handlers (new migration 0008)
- T37: guest_added / guest_removed event handlers (modifies world.py)
- T38: relationship-seed service ("have they met?")
- T39: interjection classifier service
- T40: multi-entity state-update coordinator (6 directed pairs)
- T41: multi-witness memory write helper
- T42: drawer guest add/remove UI + render
- T43: multi-entity prompt assembly (extends T18)
- T44: multi-entity turn flow (rewrites post_turn)
- T45: multi-entity per-POV summaries on scene close
- T46: witness filter cross-coverage tests
- T47: bot_reset cascades to guest references
- T48: Phase 2 documentation update

Plan also documents:
- Worktree-per-subagent dispatch pattern using Agent isolation flag
- Merge ordering per wave (file-disjointness = conflict-free merges)
- Failure recovery (cancel failed parallel task, re-dispatch as solo)
- Conflict prevention checklist (verify Files sections disjoint per wave)

Tasks file (.tasks.json) carries dependency DAG with `blockedBy` and
`parallelGroup` so a future executing-plans run can dispatch correctly.

NOT EXECUTING. Plan only.

2026-04-26 15:37:07 -04:00

42 KiB

Raw Blame History

Roleplay Engine — Phase 2 Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task. Use superpowers-extended-cc:dispatching-parallel-agents for the parallel waves below.

Goal: Add multi-entity (3-entity) scene support: guest bot can be added to a host's chat; up to 3 entities present simultaneously (you + host + guest); turn flow handles silent witnesses, interjections, per-pair edges, and per-witness memory; drawer reflects guest state; scene close writes per-POV summaries for each present witness.

Architecture: Builds on Phase 1's event-sourced architecture. New event kinds (guest_added, guest_removed, group_node_initialized) carry the multi-entity state changes; existing handlers (edge_update, memory_written) already accept any source_id/target_id and witness mask, so most schema work is additive. The "have they met?" first-co-appearance prompt runs once per (botA, botB) pair and seeds initial inter-bot edges via existing edge_update events.

Tech Stack: Same as Phase 1 (Python 3.11+, FastAPI, HTMX, SQLite, Featherless). No new dependencies.

Source-of-truth references:

Phase 2 scope: requirements doc §13 "Phase 2 — multi-entity"
Behavioral details: requirements doc §6.2 (turn-taking with interjection), §7.5 (guest leaves), §8.5 (memory ownership), §11.2 (per-POV summaries on close)
Conventions: ../../CLAUDE.md §"Behavioral defaults"
Phase 1 plan (style, TDD pattern): 2026-04-26-v1-phase1-implementation.md

When a task says "see §X", that's the requirements doc unless stated otherwise.

Pre-flight

Branch: Create phase-2 from the latest main after Phase 1 has been merged. If Phase 1 is still in PR review, branch off phase-1 directly:

# Option A: after main has phase-1 merged
git checkout main && git pull && git checkout -b phase-2

# Option B: continue from phase-1 directly
git checkout phase-1 && git pull && git checkout -b phase-2

Schema baseline: Phase 1 leaves the DB at version 7. Phase 2 adds 0008_group_node.sql. No other migrations expected.

Pinned non-negotiables (carried forward from Phase 1):

State changes go through the event log. Use append_and_apply(conn, kind, payload) for the live path; apply_event only after a fresh append_event returning the new id.
Witness filter every memory read at SQL level (hard WHERE constraint; never a soft signal).
Edges are directed; botA → botB and botB → botA are independent records.
Per-POV scene summaries — never write omniscient narration.
TDD: every task starts with a failing test.
One commit per task minimum, more if it splits naturally.

Verification before claiming done: Use superpowers-extended-cc:verification-before-completion — run the test command, paste actual output. Don't assume green.

Parallel-Execution Strategy

This plan is structured into 6 waves of tasks. Within a wave, tasks are designed to touch disjoint files so they can be executed by parallel subagents safely. Between waves, the controller (you, the controlling Claude session) merges each subagent's commits and verifies the suite stays green before dispatching the next wave.

How to dispatch a wave in parallel

Use the Agent tool with isolation: "worktree" so each subagent gets its own git worktree. The runtime cleans up the worktree automatically if no changes are made; otherwise it returns the path + branch for the controller to merge.

In a single message, dispatch all tasks in the wave:

Agent({
  description: "Wave 1 — T36 group_node schema",
  subagent_type: "general-purpose",
  isolation: "worktree",
  prompt: "<full task text from below>",
})
Agent({
  description: "Wave 1 — T37 guest events",
  subagent_type: "general-purpose",
  isolation: "worktree",
  prompt: "<full task text from below>",
})
Agent({
  description: "Wave 1 — T38 relationship-seed service",
  subagent_type: "general-purpose",
  isolation: "worktree",
  prompt: "<full task text from below>",
})

All three subagents start simultaneously, each working on a private worktree branched off phase-2. They cannot see each other's changes (no shared filesystem state) — that's the safety guarantee.

After a wave completes

Each subagent returns its worktree path and commit SHA.
Run a spec + quality reviewer subagent on each completed task (same pattern as Phase 1 — see superpowers-extended-cc:requesting-code-review).

Merge the wave into phase-2 in any order (file-disjointness guarantees no conflict). Use fast-forward if possible:

git checkout phase-2
for branch in <wave-1-branches>; do
  git merge --no-ff "$branch" -m "merge: <task description>"
done

Run the full test suite on the merged phase-2. If it's red, the wave's mutual independence assumption was violated — bisect to find the offending pair, fix, re-merge.
Push phase-2 to gitea so the work is durable before the next wave starts.
Optionally clean up worktrees: git worktree remove .worktrees/<branch>.

Conflict prevention checklist (apply before dispatch)

For each parallel wave, verify the Files sections of all tasks in that wave have no overlapping paths. The waves below are designed to satisfy this; if you decide to add or merge tasks, re-check.

If a hot file (chat/web/turns.py, chat/services/prompt.py, chat/templates/_drawer.html) needs changes from multiple tasks, do not parallelize them — serialize within the wave or split into separate waves.

Failure recovery

If one subagent in a parallel wave fails (test failures, blocked, infinite loop):

Do not block the wave on a failure. Cancel the failed subagent, merge the others' successful work, and re-dispatch the failed task as a single follow-up.
If a failure exposes a bad assumption shared by multiple tasks (e.g. an event-payload schema mismatch), pause the wave and revisit the plan.

Why each wave is parallel-safe

Wave	Tasks	Hot files touched	Disjoint?
1	T36, T37, T38	new files only + `chat/state/world.py` (T37 only)	✅
2	T39, T40, T41	new files only + `chat/services/memory_write.py` (T41 only)	✅
3	T42	`chat/web/drawer.py`, `chat/templates/_drawer.html`	(single task)
4a	T43, T45	`chat/services/prompt.py` (T43), `chat/services/scene_summarize.py` (T45)	✅
4b	T44	`chat/web/turns.py`, `chat/services/regenerate.py`	(single task; depends on 4a)
5	T46, T47, T48	new tests + `chat/state/entities.py` (T47) + docs (T48)	✅

Task overview

Wave 1  ─┬─ T36: group_node schema + handler
         ├─ T37: guest_added / guest_removed events
         └─ T38: relationship-seed service ("have they met?")

Wave 2  ─┬─ T39: interjection classifier service
         ├─ T40: multi-entity state-update coordinator
         └─ T41: multi-witness memory write helper

Wave 3  ─── T42: drawer guest support (add/remove + render guest state)

Wave 4a ─┬─ T43: multi-entity prompt assembly (extends assemble_narrative_prompt)
         └─ T45: multi-entity per-POV summaries on scene close

Wave 4b ─── T44: multi-entity turn flow integration (post_turn rewrite)

Wave 5  ─┬─ T46: witness filter test coverage (cross-witness scenarios)
         ├─ T47: bot reset cascades to guest scenes
         └─ T48: Phase 2 documentation update

Critical path: 6 sequential merge points. Total tasks: 13. Wall-clock parallelism advantage depends on subagent dispatch overhead, but in principle Wave 1's 3 tasks can run concurrently in ~the time of one task.

Wave 1 — Foundation

These three tasks are fully independent: T36 adds new files only, T37 modifies chat/state/world.py (additive event handlers), T38 adds new files only. Dispatch all three in parallel.

Task 36: Group node schema + handler

Files:

Create: chat/db/migrations/0008_group_node.sql
Create: chat/state/group_node.py
Create: tests/test_group_node.py

Spec: Adds the group_node table (one row per chat, populated when all three entities are present in a scene) and a projector handler for the group_node_initialized event. The group node carries the shared summary, group dynamic, inside jokes, and active threads (Phase 3 will populate active_threads; for Phase 2, just summary and dynamic matter).

Step 1: Write the failing test

# tests/test_group_node.py
from chat.db.migrate import apply_migrations
from chat.db.connection import open_db
from chat.eventlog.log import append_event
from chat.eventlog.projector import project
from chat.state.group_node import get_group_node
import chat.state.entities  # noqa
import chat.state.world  # noqa
import chat.state.group_node  # noqa: F401  - registers handlers

def test_group_node_initialized_creates_row(tmp_path):
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        # Seed bot, you, chat (minimal world state — no scene yet)
        append_event(conn, kind="bot_authored", payload={
            "id": "bot_a", "name": "BotA", "persona": "...",
            "voice_samples": [], "traits": [], "backstory": "",
            "initial_relationship_to_you": "", "kickoff_prose": "",
        })
        append_event(conn, kind="chat_created", payload={
            "id": "chat_bot_a", "host_bot_id": "bot_a",
            "initial_time": "2026-04-26T20:00:00+00:00",
            "narrative_anchor": "Day 1", "weather": "",
        })
        append_event(conn, kind="group_node_initialized", payload={
            "chat_id": "chat_bot_a",
            "members": ["you", "bot_a", "bot_b"],
            "summary": "",
            "dynamic": "",
        })
        project(conn)
        gn = get_group_node(conn, "chat_bot_a")
        assert gn is not None
        assert gn["members"] == ["you", "bot_a", "bot_b"]
        assert gn["summary"] == ""

Step 2: Run test to verify it fails

.venv/bin/pytest tests/test_group_node.py -v

Expected: ModuleNotFoundError: No module named 'chat.state.group_node'.

Step 3: Write minimal implementation

chat/db/migrations/0008_group_node.sql:

CREATE TABLE group_node (
    chat_id TEXT PRIMARY KEY,
    members_json TEXT NOT NULL,
    summary TEXT NOT NULL DEFAULT '',
    dynamic TEXT NOT NULL DEFAULT '',
    threads_json TEXT NOT NULL DEFAULT '[]',
    updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);

chat/state/group_node.py:

from __future__ import annotations
import json
from sqlite3 import Connection
from chat.eventlog.projector import on
from chat.eventlog.log import Event


@on("group_node_initialized")
def _apply_group_node_initialized(conn: Connection, e: Event) -> None:
    p = e.payload
    conn.execute(
        "INSERT OR REPLACE INTO group_node "
        "(chat_id, members_json, summary, dynamic, threads_json) "
        "VALUES (?, ?, ?, ?, ?)",
        (
            p["chat_id"],
            json.dumps(p["members"]),
            p.get("summary", ""),
            p.get("dynamic", ""),
            json.dumps(p.get("threads", [])),
        ),
    )


@on("group_node_updated")
def _apply_group_node_updated(conn: Connection, e: Event) -> None:
    """T45 calls this on scene close to rewrite summary + dynamic."""
    p = e.payload
    conn.execute(
        "UPDATE group_node SET summary = ?, dynamic = ?, updated_at = datetime('now') "
        "WHERE chat_id = ?",
        (p.get("summary", ""), p.get("dynamic", ""), p["chat_id"]),
    )


def get_group_node(conn: Connection, chat_id: str) -> dict | None:
    row = conn.execute(
        "SELECT chat_id, members_json, summary, dynamic, threads_json, updated_at "
        "FROM group_node WHERE chat_id = ?",
        (chat_id,),
    ).fetchone()
    if not row:
        return None
    return {
        "chat_id": row[0],
        "members": json.loads(row[1]),
        "summary": row[2],
        "dynamic": row[3],
        "threads": json.loads(row[4]),
        "updated_at": row[5],
    }

Step 4: Run test to verify it passes

.venv/bin/pytest tests/test_group_node.py -v

Expected: 1 passed.

Step 5: Commit

git add chat/db/migrations/0008_group_node.sql chat/state/group_node.py tests/test_group_node.py
git commit -m "feat: group_node schema + projector handlers"

Notes for the implementer:

Add a second test for group_node_updated: append init then update, assert summary and dynamic change but members stays.
Add a test for get_group_node returning None on a missing chat_id.
Schema version after migration: 8. The migration runner handles this automatically; no test assertion on schema_version.

Task 37: Guest add / remove events + handlers

Files:

Modify: chat/state/world.py (add _apply_guest_added, _apply_guest_removed handlers; both update chats.guest_bot_id)
Create: tests/test_guest_events.py

Spec: Two new event kinds.

guest_added payload: {chat_id, guest_bot_id}. Handler sets chats.guest_bot_id = ?.
guest_removed payload: {chat_id}. Handler sets chats.guest_bot_id = NULL.

These are pure state mutations — no related side effects. The kickoff parse-and-confirm flow (T13, Phase 1) and the new T39 interjection / T42 drawer routes will append these events.

Step 1: Write the failing test

# tests/test_guest_events.py
from chat.db.migrate import apply_migrations
from chat.db.connection import open_db
from chat.eventlog.log import append_event
from chat.eventlog.projector import project
from chat.state.world import get_chat
import chat.state.entities  # noqa
import chat.state.world  # noqa


def test_guest_added_sets_guest_bot_id(tmp_path):
    db = tmp_path / "t.db"
    apply_migrations(db)
    with open_db(db) as conn:
        # Seed bot, chat
        append_event(conn, kind="bot_authored", payload={
            "id": "bot_a", "name": "BotA", "persona": "...",
            "voice_samples": [], "traits": [], "backstory": "",
            "initial_relationship_to_you": "", "kickoff_prose": "",
        })
        append_event(conn, kind="bot_authored", payload={
            "id": "bot_b", "name": "BotB", "persona": "...",
            "voice_samples": [], "traits": [], "backstory": "",
            "initial_relationship_to_you": "", "kickoff_prose": "",
        })
        append_event(conn, kind="chat_created", payload={
            "id": "chat_bot_a", "host_bot_id": "bot_a",
            "initial_time": "2026-04-26T20:00:00+00:00",
            "narrative_anchor": "Day 1", "weather": "",
        })
        append_event(conn, kind="guest_added", payload={
            "chat_id": "chat_bot_a", "guest_bot_id": "bot_b",
        })
        project(conn)
        chat = get_chat(conn, "chat_bot_a")
        assert chat["guest_bot_id"] == "bot_b"


def test_guest_removed_clears_guest_bot_id(tmp_path):
    # similar: add then remove, assert guest_bot_id is None
    ...

Step 3: Implementation

In chat/state/world.py, add the two handlers next to _apply_chat_created:

@on("guest_added")
def _apply_guest_added(conn: Connection, e: Event) -> None:
    p = e.payload
    conn.execute(
        "UPDATE chats SET guest_bot_id = ? WHERE id = ?",
        (p["guest_bot_id"], p["chat_id"]),
    )


@on("guest_removed")
def _apply_guest_removed(conn: Connection, e: Event) -> None:
    p = e.payload
    conn.execute(
        "UPDATE chats SET guest_bot_id = NULL WHERE id = ?",
        (p["chat_id"],),
    )

Step 5: Commit

git add chat/state/world.py tests/test_guest_events.py
git commit -m "feat: guest_added / guest_removed event handlers"

Notes:

2 tests minimum (added, removed). Optional third: idempotent re-add (overwrites cleanly).
Don't add any UI here — T42 handles UI.

Task 38: Relationship-seed service ("have they met?")

Files:

Create: chat/services/relationship_seed.py
Create: tests/test_relationship_seed.py

Spec: Per requirements §5.2: when two bots first co-appear in a chat, prompt the user with "Have they met before? If yes, write a short prose seed describing how." The seed is parsed via classifier into structured botA ↔ botB edge content (summary + initial knowledge facts).

This task adds the service layer only. T39 (interjection) doesn't touch this; T42 (drawer guest UI) calls it via a route added there. So at the service level, we just expose:

async def seed_inter_bot_edges(
    client: LLMClient,
    *,
    classifier_model: str,
    bot_a_id: str,
    bot_a_name: str,
    bot_b_id: str,
    bot_b_name: str,
    relationship_prose: str,  # user-supplied prose; empty = "they haven't met"
    timeout_s: float = 30.0,
) -> RelationshipSeed:
    """Parse user-supplied prose into structured edge content for both
    directed pairs (bot_a → bot_b and bot_b → bot_a). Return the
    RelationshipSeed; caller is responsible for emitting two edge_update
    events."""

RelationshipSeed:

class RelationshipSeed(BaseModel):
    a_to_b_summary: str = ""
    a_to_b_knowledge_facts: list[str] = Field(default_factory=list)
    a_to_b_affinity_delta: int = 0  # signed, -10..+10 typical
    a_to_b_trust_delta: int = 0
    b_to_a_summary: str = ""
    b_to_a_knowledge_facts: list[str] = Field(default_factory=list)
    b_to_a_affinity_delta: int = 0
    b_to_a_trust_delta: int = 0

If relationship_prose is empty/whitespace, short-circuit and return an empty RelationshipSeed (they haven't met → fresh edges with default 50/50).

Step 1: Failing test

import pytest, json
from chat.llm.mock import MockLLMClient
from chat.services.relationship_seed import seed_inter_bot_edges, RelationshipSeed


@pytest.mark.asyncio
async def test_seed_parses_canned_prose():
    canned = json.dumps({
        "a_to_b_summary": "BotA and BotB went to college together.",
        "a_to_b_knowledge_facts": ["BotB has a younger brother."],
        "a_to_b_affinity_delta": 5,
        "a_to_b_trust_delta": 3,
        "b_to_a_summary": "BotB sees BotA as the responsible one.",
        "b_to_a_knowledge_facts": ["BotA was once a TA."],
        "b_to_a_affinity_delta": 4,
        "b_to_a_trust_delta": 5,
    })
    mock = MockLLMClient(canned=[canned])
    seed = await seed_inter_bot_edges(
        mock, classifier_model="x",
        bot_a_id="bot_a", bot_a_name="BotA",
        bot_b_id="bot_b", bot_b_name="BotB",
        relationship_prose="They went to college together; BotB still sees BotA as the responsible one.",
    )
    assert "college" in seed.a_to_b_summary
    assert seed.a_to_b_affinity_delta == 5


@pytest.mark.asyncio
async def test_seed_empty_prose_returns_empty():
    mock = MockLLMClient(canned=[])  # never called
    seed = await seed_inter_bot_edges(
        mock, classifier_model="x",
        bot_a_id="bot_a", bot_a_name="BotA",
        bot_b_id="bot_b", bot_b_name="BotB",
        relationship_prose="",
    )
    assert seed == RelationshipSeed()

Step 3: Minimal impl

Wraps classify() from chat.llm.classify with a RelationshipSeed schema and a system prompt explaining the task.

Step 5: Commit

git add chat/services/relationship_seed.py tests/test_relationship_seed.py
git commit -m "feat: relationship-seed service for first-co-appearance prompt"

Wave 2 — Services

After Wave 1 merges, dispatch Wave 2 in parallel. T39 and T40 are new files; T41 modifies chat/services/memory_write.py (additive — adds a new function alongside existing record_turn_memory).

Task 39: Interjection classifier service

Files:

Create: chat/services/interjection.py
Create: tests/test_interjection.py

Spec: Per requirements §6.2: when a guest is present and the addressee bot has just spoken, decide whether the non-addressee bot interjects. Classifier returns {should_interject: bool, reason: str}. Caller (T44 turn flow) generates the interjection beat as a brief follow-on response if should_interject.

Public API:

class InterjectionDecision(BaseModel):
    should_interject: bool = False
    reason: str = ""


async def detect_interjection(
    client: LLMClient,
    *,
    classifier_model: str,
    addressee_name: str,
    addressee_just_said: str,
    silent_witness_name: str,
    silent_witness_persona: str,
    silent_witness_edge_to_addressee: dict,  # {affinity, trust, summary}
    silent_witness_edge_to_you: dict,
    you_just_said: str,
    timeout_s: float = 30.0,
) -> InterjectionDecision:
    """Decide whether the silent witness bot interjects after the addressee
    finishes speaking. Conservative bias — most turns should NOT interject
    (return False). Trigger only when the witness's character would
    plausibly speak up: jealousy, surprise, agreement worth voicing,
    correcting a falsehood, etc.
    """

Classifier system prompt should explicitly bias toward should_interject=false (per spec: "addressee gets the floor"; interjection is the exception).

Tests: 3 minimum.

Mock returns {should_interject: true, reason: "..."} → result is True.
Mock returns {should_interject: false} → result is False.
Classifier failure → fallback default (should_interject=false, reason="fallback").

Commit: feat: interjection classifier service

Task 40: Multi-entity state-update coordinator

Files:

Create: chat/services/multi_state_update.py
Create: tests/test_multi_state_update.py

Spec: Wraps the existing chat.services.state_update.compute_state_update (single-pair) into a coordinator that runs state updates for all directed pairs of present entities. With 3 entities (you, host, guest), that's 6 pairs:

you → host, host → you
you → guest, guest → you
host → guest, guest → host

Returns a list of (source_id, target_id, StateUpdate) tuples; caller (T44) emits one edge_update event per tuple via append_and_apply.

Public API:

async def compute_state_updates_for_present(
    client: LLMClient,
    *,
    classifier_model: str,
    present_ids: list[str],     # e.g. ["you", "bot_a", "bot_b"]
    present_names: dict[str, str],  # id -> display name
    personas: dict[str, str],   # id -> persona blob
    prior_edges: dict[tuple[str, str], dict],  # (src, tgt) -> {affinity, trust, summary}
    recent_dialogue: list[dict],  # [{speaker, text}, ...]
    timeout_s: float = 30.0,
) -> list[tuple[str, str, StateUpdate]]:
    """Run compute_state_update for every directed pair where source != target.
    Returns list of (source_id, target_id, update) tuples. Skips pairs
    involving "you" with itself.
    """

Implementation: nested loops over present_ids, sequential calls to compute_state_update (parallel calls would exceed the Featherless 2-connection cap from the FeatherlessClient semaphore).

Tests: 3 minimum.

With 2 present (you, host) → returns 2 updates (existing 1A/2D parity).
With 3 present (you, host, guest) → returns 6 updates, one per directed non-self pair.
Failures in one pair don't kill the whole batch (per-pair compute_state_update already has a default fallback).

Commit: feat: multi-entity state-update coordinator

Task 41: Multi-witness memory write helper

Files:

Modify: chat/services/memory_write.py (add record_turn_memory_for_present alongside existing record_turn_memory; do NOT remove or change record_turn_memory)
Add tests to: tests/test_memory_write.py

Spec: Currently Phase 1's record_turn_memory(conn, *, chat_id, host_bot_id, narrative_text, ...) writes a single memory event for the host bot's POV. With a guest present, we need:

One memory in the host's store (witness mask [1, 1, 1] if you/host/guest present)
One memory in the guest's store (same witness mask, owner = guest_bot_id)

"You" still doesn't have a memory store in v1 (per §5.4 / §11.2).

New helper:

def record_turn_memory_for_present(
    conn,
    *,
    chat_id: str,
    host_bot_id: str,
    guest_bot_id: str | None,
    narrative_text: str,
    scene_id: int | None = None,
    chat_clock_at: str | None = None,
    source: str = "direct",
    significance: int = 1,
) -> dict[str, tuple[int, int]]:
    """Write a memory_written event for each present bot witness (host
    always; guest if guest_bot_id is not None). Returns {bot_id:
    (event_id, memory_id)}.

    Witness mask is [1, 1, 1] when guest is present, [1, 1, 0] otherwise
    (mirrors Phase 1 single-bot behavior when guest_bot_id is None).
    """

Implementation: appends one memory_written event per present bot, calling append_and_apply for each, and queries the resulting memories.id per owner+chat just like Phase 1's record_turn_memory.

Tests: 3 minimum, added to tests/test_memory_write.py:

With guest_bot_id=None, behaves identically to record_turn_memory (one memory for host, witness [1, 1, 0]).
With guest_bot_id="bot_b", writes two memories — one each for host and guest, both with witness [1, 1, 1].
Returned dict keys match {host_bot_id, guest_bot_id} (or just {host_bot_id} when no guest).

Commit: feat: multi-witness memory write helper

Wave 3 — Drawer guest support (single task)

This wave is one task because all Phase 2 drawer work touches the same two files (chat/web/drawer.py and chat/templates/_drawer.html). Splitting would force serial execution with conflict resolution. Single-task wave runs alone.

Task 42: Drawer guest support (add/remove + render)

Files:

Modify: chat/web/drawer.py (add POST /chats/{chat_id}/drawer/guest/add, POST /chats/{chat_id}/drawer/guest/remove; extend drawer GET handler to query guest state when present)
Modify: chat/templates/_drawer.html (render guest activity, guest edges, group node summary; add "Add guest" form and "Remove guest" button when applicable)
Create: tests/test_drawer_guest.py

Spec:

GET /chats/{chat_id}/drawer (extend, don't replace):

Read chat["guest_bot_id"] from the existing get_chat query.
If guest present: also fetch get_bot(conn, guest_bot_id), get_activity(conn, guest_bot_id), edges in both host ↔ guest directions, edges in both you ↔ guest directions, and get_group_node(conn, chat_id).
Pass all of this to the template.

Template changes:

New section "Guest" rendering guest's name, activity, and the four edges involving the guest.
New section "Group" rendering group_node.summary and group_node.dynamic when present.
"Add guest" button → expands form with: bot selector (dropdown of authored bots not currently in this chat) + relationship prose textarea (the "have they met?" prompt).
"Remove guest" button visible when a guest is present.

POST /chats/{chat_id}/drawer/guest/add route:

Read form: guest_bot_id, relationship_prose.
404 if chat or guest_bot is missing.
400 if guest_bot_id == host_bot_id.
400 if a guest is already present.
Call seed_inter_bot_edges (T38) with the prose. May produce empty seed if prose is blank.
Append events: guest_added, then up to 2 edge_update events (host ↔ guest deltas from the seed). Use append_and_apply for each.
If all 3 entities are now present and no group_node row exists for this chat, append group_node_initialized with members=[you, host, guest] and empty summary/dynamic.
Return refreshed drawer partial.

POST /chats/{chat_id}/drawer/guest/remove route:

404 if chat missing; 400 if no guest present.
Append scene_closed for the active scene (per §7.5: removing the guest closes the current scene).
Append guest_removed.
(Per §7.5 the host's chat then implicitly opens a new scene with you+host. For Phase 2, leave that as a manual "next user message creates the new scene" — same as Phase 1 mid-chat reset semantics. Phase 3 may auto-open.)
Return refreshed drawer partial.

Tests (tests/test_drawer_guest.py): 6 minimum.

GET drawer with no guest → no "Guest" section in body.
POST add guest → 303-or-200 with refreshed drawer; chat.guest_bot_id is set; group_node row created; relationship-seed mock returns canned values; edges have the seeded values.
POST add guest with empty relationship_prose → guest added; seed_inter_bot_edges short-circuits; edges remain at default 50/50.
POST add guest when one is already present → 400.
POST remove guest → guest_bot_id NULL, scene_closed event written.
GET drawer with guest present → "Guest" section + group_node summary visible.

Commit: feat: drawer guest add/remove + render

Notes for implementer:

The guest-bot-selector dropdown lists bots from list_bots(conn) minus the host. Don't filter for "bots not in any chat" — guests can be in multiple chats simultaneously (each chat has its own scene state).
The "have they met?" prose textarea is the per-pair prompt. v1 only fires it on first co-appearance globally; for v2, fire it every time a (host, guest) pair has no existing host → guest edge. After the first add, the edge exists, so subsequent adds skip the prose (or render it disabled with "you've already met"). Treat this as Phase 2.5 polish if it gets fiddly — for T42 just always show the prose textarea, blank by default.
The drawer route already uses Depends(get_conn) and templates; reuse the existing dependency and TEMPLATES instance.

Wave 4a — Multi-entity prompt + scene close (parallel)

T43 and T45 touch different files (prompt.py and scene_summarize.py). Dispatch both in parallel.

Task 43: Multi-entity prompt assembly

Files:

Modify: chat/services/prompt.py (extend assemble_narrative_prompt to handle a guest_id parameter and fetch guest activity, guest edge, group node into the prompt blocks)
Add tests to: tests/test_prompt.py

Spec: The current assemble_narrative_prompt(conn, *, chat_id, speaker_bot_id, addressee="you", ...) only handles you+host. Extend:

Accept a guest_id: str | None = None parameter (auto-fetched from chat.guest_bot_id if not passed; explicit override for tests).
When guest_id is provided:
- Activity block includes the guest's activity (get_activity(conn, guest_id)).
- If speaker_bot_id == guest_id, the addressee defaults to "you" but caller can override.
- "Speaker's other edges" SHOULD-tier block includes speaker → non-addressee (e.g., host → guest if speaker is host and addressee is you).
- MUST-tier identity block unchanged (still just speaker).
- Group-node summary becomes a SHOULD-tier block when all three are present (after MUST, before retrieved memories).
Token budget tier ordering unchanged.

Tests: 4 minimum, added to tests/test_prompt.py:

With guest_id=None, output matches existing 2-entity behavior (regression).
With guest_id="bot_b" present and group_node populated, the assembled system message contains: speaker identity, guest activity, group_node summary, host→guest edge for the speaker.
Speaker is the guest (speaker_bot_id == guest_id), addressee="you" → guest's edges and group node correctly oriented.
Tight budget forces NICE-trim of guest activity → MUST blocks (speaker identity, edge_to_addressee, last 4 turns) survive.

Commit: feat: multi-entity prompt assembly with guest activity, edges, group node

Task 45: Multi-entity per-POV summaries on scene close

Files:

Modify: chat/services/scene_summarize.py (extend apply_scene_close_summary to write per-POV summaries for each present witness with a memory store, not just host)
Modify: tests in tests/test_per_pov_summary.py

Spec: Phase 1's apply_scene_close_summary only summarizes from the host bot's POV. For Phase 2:

Determine present witnesses with memory stores: host always; guest if chat.guest_bot_id is not None.
For each, generate an independent per-POV summary via summarize_scene (the existing classifier wrapper). Each call uses that bot's persona, you_name, prior bot → you edge summary, and the same dialogue.
Update each owner's memories of the closing scene with their per-POV summary.
Update all directed bot → you edges with per-POV-derived summary content.
If group_node exists for this chat, also append group_node_updated event with new summary and dynamic derived from the group view (run summarize_scene once with bot_name="group", bot_persona="all participants" for a meta-summary). For v1 simplicity, the meta-summary can be naive concat of the host's per-POV summary + guest's per-POV summary; full LLM-merged group view is deferred to Phase 2.5.

Tests: 4 minimum, added to tests/test_per_pov_summary.py:

With no guest, behavior matches Phase 1 (regression test).
With guest, apply_scene_close_summary calls summarize_scene twice (one per bot witness) — assert mock called 2x.
After close, each bot's memories of the closed scene have their respective per-POV summary (different text).
With group_node present, after close get_group_node(conn, chat_id).summary is updated.

Commit: feat: per-POV summaries on close for each present witness

Wave 4b — Turn flow integration (single task; depends on 4a)

T44 ties everything together. It modifies chat/web/turns.py (post_turn) and chat/services/regenerate.py to use the new multi-entity primitives. Must run after Wave 4a is merged so assemble_narrative_prompt accepts guest_id and apply_scene_close_summary handles guest.

Task 44: Multi-entity turn flow

Files:

Modify: chat/web/turns.py (rewrite post_turn to: parse turn → optionally close scene → assemble prompt with guest → narrative stream → write memories for ALL witnesses → state updates for ALL pairs → interjection check + interjection narrative if needed)
Modify: chat/services/regenerate.py (mirror the changes — regenerated turn rebuilds with guest in scope)
Modify: tests in tests/test_turn_flow.py (add multi-entity scenarios)

Spec: Refactored post_turn flow:

1. Validate prose (existing 400 check).
2. Look up chat, host_bot, guest_bot (None if no guest).
3. Parse turn (existing parse_turn).
4. Append user_turn event.
5. Append assistant_turn_started.
6. Detect scene close (existing path; runs even with guest).
7. (Recent dialogue read with multi-witness in mind — same query.)
8. Determine ADDRESSEE: simplest v2 heuristic — addressee is host unless
   prose explicitly names guest_bot.name. Pass to assemble_narrative_prompt.
9. Assemble narrative prompt with speaker=addressee, guest_id passed.
10. Stream narrative; broadcast tokens; commit assistant_turn (existing).
11. Write memories: record_turn_memory_for_present(host, guest).
12. State updates: compute_state_updates_for_present, then append_and_apply
    one edge_update per pair.
13. INTERJECTION CHECK (only if guest present and addressee != silent witness):
    a. Call detect_interjection with the silent witness as candidate.
    b. If should_interject: assemble narrative prompt with speaker=silent_witness,
       addressee=host (or whoever just spoke), and instruct briefly.
    c. Stream second narrative; broadcast as second turn_html; commit second
       assistant_turn event.
    d. Run state updates + memory writes for the interjection turn too
       (smaller scope — just the interjector's outgoing edges + memories).
14. Scene close summary (existing path; now multi-witness via T45).
15. Broadcast turn_html for primary + interjection (if any).
16. Return 204.

Addressee heuristic (Phase 2 v1): simple substring match on bot names. If both names appear or neither: addressee defaults to host. Phase 2.5 / Phase 3 may improve with a classifier call.

Cancel & truncated: unchanged from Phase 1 — both halves of a streaming turn (primary + interjection) cancel together.

regenerate.py changes: parallel to turns.py — multi-entity prompt assembly + multi-witness memory + multi-pair state update. Interjection regeneration is deferred to Phase 2.5 (regenerate only the addressee's turn for v2).

Tests added to tests/test_turn_flow.py: 5 minimum.

Single-bot turn (no guest): full suite still passes (regression).
Multi-bot turn, no interjection: post_turn produces 1 user_turn + 1 assistant_turn + 6 edge_updates + 2 memory_written events. Mock interjection returns should_interject=false.
Multi-bot turn, with interjection: produces user_turn + 2 assistant_turns + 12 edge_updates + 4 memory_written events.
Multi-bot turn, scene close fires: scene_closed + multi-POV summaries written (per T45).
Addressee detection: prose "BotB, what do you think?" routes to BotB as speaker.

Commit: feat: multi-entity turn flow with interjection support

Notes for implementer:

This task is the largest in Phase 2 by line count. Budget for ~150-300 lines of changes across turns.py and tests. The implementer should split commits if it helps clarity (one commit for primary turn, one for interjection, one for tests).
Update the existing _seed_chat helper in tests/test_turn_flow.py to optionally seed a guest, and add _seed_chat_with_guest if cleaner.
The fixture for the LLM mock now needs to provide canned responses for: parse_turn + scene_close_detect + narrative + state_updates×6 + interjection_decision + (optionally) interjection_narrative + state_updates×2 (interjection's outgoing only).

Wave 5 — Polish (parallel)

Three independent tasks. Dispatch all three in parallel after Wave 4b merges.

Task 46: Witness filter test coverage

Files:

Create: tests/test_witness_filter_multi.py

Spec: Phase 1 tested witness filtering with single-bot scenarios. Phase 2 needs explicit tests for the cross-witness cases:

Memory with witness [1, 1, 0]: visible to host, not guest (when guest queries from their POV).
Memory with witness [0, 1, 1]: visible to host and guest, not "you".
Secondhand-source memories: source: "told_by:bot_a", witness flag for bot_b set, reliability < 1.0.

5 tests minimum.

Commit: test: witness filter coverage for multi-entity scenarios

Task 47: Bot reset cascades to guest scenes

Files:

Modify: chat/state/entities.py (_apply_bot_reset extended to also remove the bot's guest_bot_id references in OTHER chats: UPDATE chats SET guest_bot_id = NULL WHERE guest_bot_id = ?; remove the bot's activity row in those chats too)
Modify: tests in tests/test_reset.py (add scenario: bot is guest in another's chat; reset clears the guest reference)

Spec: Currently bot_reset purges the bot's own chat state, memories, and edges. With Phase 2, a bot can be a guest in another bot's chat — that reference must also clear. Otherwise the host's chat sees a stale guest_bot_id pointing at a phantom bot.

Update _apply_bot_reset handler:

# After existing purges:
conn.execute("UPDATE chats SET guest_bot_id = NULL WHERE guest_bot_id = ?", (bot_id,))
conn.execute("DELETE FROM activity WHERE entity_id = ?", (bot_id,))  # already there; covers all chats

(Activity is keyed by entity_id, so the existing line handles cross-chat activity rows already.)

Tests: 2 minimum, added to tests/test_reset.py.

BotB is guest in BotA's chat. Reset BotB. Assert chat_bot_a.guest_bot_id is NULL.
BotB has memories (witness flag set, owner=bot_b) from being guest in BotA's chat. Reset BotB. Assert those memories are gone.

Commit: fix: bot_reset cascades to guest references in other chats

Task 48: Phase 2 documentation update

Files:

Modify: CLAUDE.md (add "Phase 2 status" section; update "Behavioral defaults" with multi-entity additions; add to "Phase 1.5 / 2 cleanup backlog" any v2 follow-ups discovered during execution)
Modify: docs/plans/2026-04-26-v1-requirements-design.md (mark Phase 2 deliverables as "shipped" in the appendix decisions log)

Spec: Documentation-only task. Run last in Phase 2 so it captures any deviations from the plan that emerged during execution. Reflect:

Multi-entity scene support (you + host + guest).
Interjection model (default false; explicit signals only).
Per-POV summaries on close for all witnesses with memory stores.
Group node populated on first 3-entity scene; updated on close.
Phase 2 known limitations:
- "Meanwhile…" (scene config 4 — bot+bot without you) deferred to Phase 3.
- Interjection regeneration deferred (regenerate only acts on the addressee turn).
- Addressee detection is a simple name-match heuristic (no classifier call yet).

Commit: docs: phase 2 status, behavioral defaults, deferred items

Wrap-up

After Wave 5 lands:

Run full suite on phase-2: should be ~210+ tests passing (168 from Phase 1 + ~45 new).
Manual smoke:
- Add a guest to one of the seeded bots' chats via the drawer.
- Verify "have they met?" prose seeds inter-bot edges.
- Play a few turns; verify host responds normally; verify guest occasionally interjects.
- Close the scene; check drawer for two distinct per-POV summaries.
- Remove guest mid-scene; check scene_closed fires.
- Reset a guest bot from another chat; verify guest_bot_id reference clears.
Push phase-2 to gitea.
Open PR phase-2 → main.
Phase 2.5 backlog candidates (track in CLAUDE.md): interjection regenerate UI, classifier-based addressee detection, group-node LLM-merged meta-summary, drawer "first-meeting" gate vs "they already know each other" toggle, witness flag editing in drawer (currently read-only by spec).

Notes for the controller running this plan

Don't dispatch Wave 4b until Wave 4a is merged AND tested green on phase-2. Wave 4b's turns.py changes import the new assemble_narrative_prompt signature from Wave 4a's prompt.py; missing that produces import-time failures.
After each parallel wave, the controller should run a code-review subagent (subagent-driven-development skill's two-stage review pattern) on each task before merging to phase-2. For purely mechanical tasks, a combined spec+quality review is acceptable.
If a parallel wave's merge produces a conflict, the wave's file-disjointness assumption was violated. Bisect the affected pair, fix the offending task in a follow-up commit on phase-2, and proceed.
Token-spend rough estimate: Phase 2 should be ~30-40% the size of Phase 1 (smaller scope; reuses Phase 1 patterns). Per-task token spend similar to Phase 1.
DO NOT modify Phase 1 code paths unless explicitly required (e.g., Wave 5 T47 modifies _apply_bot_reset because the cascade is genuinely new behavior). The single-bot path must continue to work end-to-end after each wave.

42 KiB Raw Blame History Unescape Escape

Roleplay Engine — Phase 2 Implementation Plan

Pre-flight

Parallel-Execution Strategy

How to dispatch a wave in parallel

After a wave completes

Conflict prevention checklist (apply before dispatch)

Failure recovery

Why each wave is parallel-safe

Task overview

Wave 1 — Foundation

Task 36: Group node schema + handler

Task 37: Guest add / remove events + handlers

Task 38: Relationship-seed service ("have they met?")

Wave 2 — Services

Task 39: Interjection classifier service

Task 40: Multi-entity state-update coordinator

Task 41: Multi-witness memory write helper

Wave 3 — Drawer guest support (single task)

Task 42: Drawer guest support (add/remove + render)

Wave 4a — Multi-entity prompt + scene close (parallel)

Task 43: Multi-entity prompt assembly

Task 45: Multi-entity per-POV summaries on scene close

Wave 4b — Turn flow integration (single task; depends on 4a)

Task 44: Multi-entity turn flow

Wave 5 — Polish (parallel)

Task 46: Witness filter test coverage

Task 47: Bot reset cascades to guest scenes

Task 48: Phase 2 documentation update

Wrap-up

Notes for the controller running this plan

42 KiB

Raw Blame History