50 KiB
Roleplay Engine — Phase 1 Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
Goal: Build the v1 (Phase 1) roleplay engine end-to-end — a local-first FastAPI + HTMX app with single-bot chats, persistent bot-owned memory, per-chat clocks, an event-sourced SQLite backend, multi-tab SSE streaming, drawer state surface, rewind / regenerate / reset, and Featherless inference (narrative + classifier models).
Architecture: Python 3.11+ FastAPI server, SQLite (single file, WAL mode) projected from an append-only event log. Featherless OpenAI-compatible client behind a LLMClient interface. Per-chat in-process pub/sub queue broadcasts state changes over SSE to all subscribed browser tabs. State changes always go through events; the projector applies them. TDD: every task starts with a failing test.
Tech Stack:
- Python 3.11+, FastAPI, Uvicorn, HTMX (CDN), Jinja2 templates, vanilla CSS.
- SQLite (stdlib
sqlite3),aiosqlitefor async paths where useful. pydanticfor state schemas,pydantic-settingsfor config.instructor(or Featherless-native JSON-mode) for classifier-constrained output viaopenaiSDK pointed athttps://api.featherless.ai/v1.tiktokenfor token accounting.pytest,pytest-asyncio,httpx(for FastAPI TestClient),freezegunfor time tests.
Source-of-truth references:
- Requirements: 2026-04-26-v1-requirements-design.md
- Architecture: ../../rp-engine-design.md
- Conventions: ../../CLAUDE.md
When a task says "see §X", that's the requirements doc unless stated otherwise.
Pre-flight
Worktree: This is a greenfield repo on main. Branch off into phase-1 before starting:
git checkout -b phase-1
Python env: Use a project-local venv (<repo>/.venv/). Add .venv/ and __pycache__/ to .gitignore in T0.
Featherless API key: Stored in data/config.toml (gitignored). The plan creates an example file in T1; you copy it and paste in your real key locally.
TDD discipline: Every task starts with a failing test. Don't skip step 2 ("run to verify it fails"). If the test passes before implementation, the test is wrong — fix the test first.
Commit cadence: One commit per task. Commit messages use feat:, chore:, test:, docs: prefixes.
Verification before claiming done: Use superpowers-extended-cc:verification-before-completion — run the test command and read its actual output. Do not claim a task complete on hope.
Phase 1A: Foundation
Task 0: Project skeleton
Files:
- Create:
pyproject.toml - Create:
.python-version - Create:
chat/__init__.py - Create:
chat/app.py - Create:
tests/__init__.py - Create:
tests/test_health.py - Modify:
.gitignore(add.venv/,__pycache__/,*.pyc,.pytest_cache/)
Step 1: Write the failing test
# tests/test_health.py
from fastapi.testclient import TestClient
from chat.app import app
def test_health_endpoint_returns_ok():
client = TestClient(app)
response = client.get("/health")
assert response.status_code == 200
assert response.json() == {"status": "ok"}
Step 2: Run test to verify it fails
python -m venv .venv && source .venv/bin/activate
pip install fastapi uvicorn[standard] httpx pytest pytest-asyncio
pytest tests/test_health.py -v
Expected: ImportError on chat.app (module doesn't exist).
Step 3: Write minimal implementation
# chat/app.py
from fastapi import FastAPI
app = FastAPI(title="chat")
@app.get("/health")
def health():
return {"status": "ok"}
pyproject.toml minimum:
[project]
name = "chat"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"fastapi>=0.110",
"uvicorn[standard]>=0.30",
"httpx>=0.27",
"pydantic>=2.6",
"pydantic-settings>=2.2",
"openai>=1.30",
"instructor>=1.3",
"tiktoken>=0.7",
"jinja2>=3.1",
"aiosqlite>=0.20",
]
[project.optional-dependencies]
dev = ["pytest>=8", "pytest-asyncio>=0.23", "freezegun>=1.4"]
[tool.pytest.ini_options]
pythonpath = ["."]
asyncio_mode = "auto"
Step 4: Run test to verify it passes
pip install -e .[dev]
pytest tests/test_health.py -v
Expected: 1 passed.
Step 5: Commit
git add pyproject.toml .python-version chat/ tests/ .gitignore
git commit -m "feat: project skeleton with health endpoint"
Task 1: Config loading
Loads data/config.toml, honors CHAT_DB_PATH env var override, exposes a Settings pydantic model. See requirements §3 / §12.
Files:
- Create:
chat/config.py - Create:
data/config.example.toml - Create:
tests/test_config.py
Step 1: Write the failing test
# tests/test_config.py
import os
from pathlib import Path
import pytest
from chat.config import load_settings
def test_load_settings_reads_toml(tmp_path, monkeypatch):
cfg = tmp_path / "config.toml"
cfg.write_text("""
featherless_api_key = "sk-test"
narrative_model = "dphn/Dolphin-Mistral-24B-Venice-Edition"
classifier_model = "NousResearch/Hermes-3-Llama-3.1-8B"
ooc_marker = "(("
retrieval_k = 4
""")
monkeypatch.setenv("CHAT_CONFIG_PATH", str(cfg))
s = load_settings()
assert s.featherless_api_key == "sk-test"
assert s.narrative_model.startswith("dphn/")
assert s.retrieval_k == 4
def test_chat_db_path_env_overrides_default(tmp_path, monkeypatch):
monkeypatch.setenv("CHAT_DB_PATH", str(tmp_path / "alt.db"))
monkeypatch.setenv("CHAT_CONFIG_PATH", str(tmp_path / "config.toml"))
(tmp_path / "config.toml").write_text('featherless_api_key = "x"\n')
s = load_settings()
assert s.db_path == tmp_path / "alt.db"
Step 2: Run test to verify it fails
pytest tests/test_config.py -v
Expected: ImportError or AttributeError.
Step 3: Write minimal implementation
# chat/config.py
from __future__ import annotations
import os
import tomllib
from pathlib import Path
from pydantic import BaseModel, Field
REPO_ROOT = Path(__file__).resolve().parent.parent
DEFAULT_CONFIG = REPO_ROOT / "data" / "config.toml"
DEFAULT_DB = REPO_ROOT / "data" / "chat.db"
class Settings(BaseModel):
featherless_api_key: str
featherless_base_url: str = "https://api.featherless.ai/v1"
narrative_model: str = "dphn/Dolphin-Mistral-24B-Venice-Edition"
classifier_model: str = "NousResearch/Hermes-3-Llama-3.1-8B"
classifier_fallbacks: list[str] = Field(
default_factory=lambda: [
"cognitivecomputations/dolphin-2.9.4-llama3-8b",
"mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated",
]
)
ooc_marker: str = "(("
retrieval_k: int = 4
narrative_budget_hard: int = 8000
narrative_budget_soft: int = 6000
classifier_budget_hard: int = 4000
classifier_timeout_s: float = 10.0
db_path: Path = DEFAULT_DB
data_dir: Path = REPO_ROOT / "data"
bind_host: str = "127.0.0.1"
bind_port: int = 8000
def load_settings() -> Settings:
config_path = Path(os.environ.get("CHAT_CONFIG_PATH", DEFAULT_CONFIG))
raw: dict = {}
if config_path.exists():
raw = tomllib.loads(config_path.read_text())
if "CHAT_DB_PATH" in os.environ:
raw["db_path"] = Path(os.environ["CHAT_DB_PATH"])
return Settings(**raw)
data/config.example.toml:
# Copy this file to data/config.toml and fill in your API key.
featherless_api_key = "REPLACE_ME"
narrative_model = "dphn/Dolphin-Mistral-24B-Venice-Edition"
classifier_model = "NousResearch/Hermes-3-Llama-3.1-8B"
ooc_marker = "(("
retrieval_k = 4
Step 4: Run test to verify it passes
pytest tests/test_config.py -v
Expected: 2 passed.
Step 5: Commit
git add chat/config.py data/config.example.toml tests/test_config.py
git commit -m "feat: config loader with toml + env override"
Task 2: SQLite migrations framework
Establishes a forward-only migration runner reading SQL files from chat/db/migrations/, tracked in a meta table (key/value).
Files:
- Create:
chat/db/__init__.py - Create:
chat/db/connection.py - Create:
chat/db/migrate.py - Create:
chat/db/migrations/0001_init_meta.sql - Create:
tests/test_migrate.py
Step 1: Write the failing test
# tests/test_migrate.py
from chat.db.connection import open_db
from chat.db.migrate import apply_migrations
def test_apply_migrations_creates_meta_table(tmp_path):
db = tmp_path / "test.db"
apply_migrations(db)
with open_db(db) as conn:
row = conn.execute(
"SELECT value FROM meta WHERE key = 'schema_version'"
).fetchone()
assert row is not None
assert int(row[0]) >= 1
def test_apply_migrations_idempotent(tmp_path):
db = tmp_path / "test.db"
apply_migrations(db)
apply_migrations(db) # second call must be a no-op
with open_db(db) as conn:
count = conn.execute("SELECT COUNT(*) FROM meta").fetchone()[0]
assert count == 1
Step 2: Run test to verify it fails
pytest tests/test_migrate.py -v
Expected: ImportError.
Step 3: Write minimal implementation
# chat/db/connection.py
from __future__ import annotations
import sqlite3
from contextlib import contextmanager
from pathlib import Path
@contextmanager
def open_db(path: Path):
path.parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(path)
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA foreign_keys=ON")
try:
yield conn
conn.commit()
finally:
conn.close()
# chat/db/migrate.py
from __future__ import annotations
from pathlib import Path
from chat.db.connection import open_db
MIGRATIONS_DIR = Path(__file__).parent / "migrations"
def apply_migrations(db_path: Path) -> None:
with open_db(db_path) as conn:
conn.execute(
"CREATE TABLE IF NOT EXISTS meta (key TEXT PRIMARY KEY, value TEXT)"
)
cur = conn.execute("SELECT value FROM meta WHERE key = 'schema_version'")
row = cur.fetchone()
current = int(row[0]) if row else 0
for path in sorted(MIGRATIONS_DIR.glob("*.sql")):
version = int(path.stem.split("_", 1)[0])
if version <= current:
continue
sql = path.read_text()
conn.executescript(sql)
conn.execute(
"INSERT OR REPLACE INTO meta (key, value) VALUES ('schema_version', ?)",
(str(version),),
)
-- chat/db/migrations/0001_init_meta.sql
-- meta table is created by the migrate runner; this migration is a marker.
SELECT 1;
Step 4: Run test to verify it passes
pytest tests/test_migrate.py -v
Expected: 2 passed.
Step 5: Commit
git add chat/db/ tests/test_migrate.py
git commit -m "feat: sqlite migration runner with meta version table"
Task 3: Featherless client with mock
Defines LLMClient protocol with generate(messages, params, stream=False) and generate_structured(messages, schema). Implementations: FeatherlessClient (real), MockLLMClient (test).
Files:
- Create:
chat/llm/__init__.py - Create:
chat/llm/client.py - Create:
chat/llm/featherless.py - Create:
chat/llm/mock.py - Create:
tests/test_llm_mock.py
Step 1: Write the failing test
# tests/test_llm_mock.py
import pytest
from chat.llm.mock import MockLLMClient
from chat.llm.client import Message
@pytest.mark.asyncio
async def test_mock_returns_canned_response():
client = MockLLMClient(canned=["Hello, world."])
msgs = [Message(role="user", content="hi")]
out = await client.generate(msgs, model="any")
assert out == "Hello, world."
@pytest.mark.asyncio
async def test_mock_streams_tokens():
client = MockLLMClient(canned=["abcd"])
msgs = [Message(role="user", content="hi")]
chunks = []
async for chunk in client.stream(msgs, model="any"):
chunks.append(chunk)
assert "".join(chunks) == "abcd"
Step 2: Run test to verify it fails
pytest tests/test_llm_mock.py -v
Expected: ImportError.
Step 3: Write minimal implementation
# chat/llm/client.py
from __future__ import annotations
from dataclasses import dataclass
from typing import Protocol, AsyncIterator, Sequence
@dataclass
class Message:
role: str # "system" | "user" | "assistant"
content: str
class LLMClient(Protocol):
async def generate(self, messages: Sequence[Message], *, model: str, **params) -> str: ...
def stream(self, messages: Sequence[Message], *, model: str, **params) -> AsyncIterator[str]: ...
# chat/llm/mock.py
from __future__ import annotations
from typing import AsyncIterator, Sequence
from .client import Message
class MockLLMClient:
def __init__(self, canned: list[str]):
self._canned = list(canned)
async def generate(self, messages: Sequence[Message], *, model: str, **params) -> str:
return self._canned.pop(0)
async def stream(self, messages: Sequence[Message], *, model: str, **params) -> AsyncIterator[str]:
text = self._canned.pop(0)
for ch in text:
yield ch
# chat/llm/featherless.py
from __future__ import annotations
from typing import AsyncIterator, Sequence
from openai import AsyncOpenAI
from .client import Message
class FeatherlessClient:
def __init__(self, api_key: str, base_url: str = "https://api.featherless.ai/v1"):
self._client = AsyncOpenAI(api_key=api_key, base_url=base_url)
async def generate(self, messages: Sequence[Message], *, model: str, **params) -> str:
resp = await self._client.chat.completions.create(
model=model,
messages=[{"role": m.role, "content": m.content} for m in messages],
**params,
)
return resp.choices[0].message.content or ""
async def stream(self, messages: Sequence[Message], *, model: str, **params) -> AsyncIterator[str]:
stream = await self._client.chat.completions.create(
model=model,
messages=[{"role": m.role, "content": m.content} for m in messages],
stream=True,
**params,
)
async for chunk in stream:
delta = chunk.choices[0].delta.content or ""
if delta:
yield delta
Step 4: Run test to verify it passes
pytest tests/test_llm_mock.py -v
Expected: 2 passed.
Step 5: Commit
git add chat/llm/ tests/test_llm_mock.py
git commit -m "feat: LLMClient protocol with Featherless and mock implementations"
Task 4: Classifier service wrapper
Wraps the classifier model with retry, timeout, and Pydantic-constrained output (per requirements §3.3). Falls back to schema-default on persistent failure. Logs failures to classifier_failures table.
Files:
- Create:
chat/db/migrations/0002_classifier_failures.sql - Create:
chat/llm/classify.py - Create:
tests/test_classify.py
Step 1: Write the failing test
# tests/test_classify.py
import pytest
from pydantic import BaseModel
from chat.llm.mock import MockLLMClient
from chat.llm.classify import classify
class Verdict(BaseModel):
score: int
reason: str
@pytest.mark.asyncio
async def test_classify_parses_valid_json():
mock = MockLLMClient(canned=['{"score": 2, "reason": "notable"}'])
result = await classify(mock, model="m", system="x", user="y", schema=Verdict)
assert result.score == 2
@pytest.mark.asyncio
async def test_classify_falls_back_on_unparseable_after_retry():
mock = MockLLMClient(canned=["nope", "still nope"])
default = Verdict(score=1, reason="fallback")
result = await classify(mock, model="m", system="x", user="y", schema=Verdict, default=default)
assert result.reason == "fallback"
Step 2: Run test to verify it fails
pytest tests/test_classify.py -v
Expected: ImportError.
Step 3: Write minimal implementation
chat/db/migrations/0002_classifier_failures.sql:
CREATE TABLE classifier_failures (
id INTEGER PRIMARY KEY,
kind TEXT NOT NULL,
model TEXT NOT NULL,
raw_text TEXT,
attempt_count INTEGER NOT NULL,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
chat/llm/classify.py:
from __future__ import annotations
import json
import asyncio
from typing import TypeVar
from pydantic import BaseModel, ValidationError
from .client import LLMClient, Message
T = TypeVar("T", bound=BaseModel)
REFUSAL_PATTERNS = ("i can't", "i cannot", "i'm sorry, but", "as an ai")
async def classify(
client: LLMClient,
*,
model: str,
system: str,
user: str,
schema: type[T],
default: T | None = None,
timeout_s: float = 10.0,
) -> T:
msgs = [
Message(role="system", content=system + "\n\nRespond with JSON only matching the schema."),
Message(role="user", content=user),
]
for attempt in range(2):
try:
text = await asyncio.wait_for(
client.generate(msgs, model=model, response_format={"type": "json_object"}),
timeout=timeout_s,
)
if any(p in text.lower()[:80] for p in REFUSAL_PATTERNS) and not text.strip().startswith("{"):
raise ValueError("refusal-shaped response")
return schema.model_validate_json(text)
except (ValidationError, ValueError, json.JSONDecodeError, asyncio.TimeoutError):
msgs[0] = Message(role="system", content=system + "\n\nRespond with valid JSON ONLY. No prose.")
continue
if default is None:
raise RuntimeError(f"classify failed for schema {schema.__name__} with no default")
return default
Step 4: Run test to verify it passes
pytest tests/test_classify.py -v
Expected: 2 passed.
Step 5: Commit
git add chat/llm/classify.py chat/db/migrations/0002_classifier_failures.sql tests/test_classify.py
git commit -m "feat: classifier wrapper with retry, timeout, schema-default fallback"
Phase 1B: Event log & state machine
Task 5: Event log + projector skeleton
Append-only event log with one row per event (id, branch_id, ts, kind, payload_json). Projector framework that dispatches per-kind handlers; initial registry is empty. State changes ALWAYS go through append_event.
Files:
- Create:
chat/db/migrations/0003_event_log.sql - Create:
chat/eventlog/__init__.py - Create:
chat/eventlog/log.py - Create:
chat/eventlog/projector.py - Create:
tests/test_eventlog.py
Step 1: Write the failing test
# tests/test_eventlog.py
from chat.db.migrate import apply_migrations
from chat.db.connection import open_db
from chat.eventlog.log import append_event, read_events
def test_append_and_read(tmp_path):
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
eid = append_event(conn, kind="test_kind", payload={"a": 1})
assert eid > 0
rows = list(read_events(conn))
assert len(rows) == 1
assert rows[0].kind == "test_kind"
assert rows[0].payload["a"] == 1
Step 2: Run test to verify it fails
Expected: missing migration / module.
Step 3: Write minimal implementation
chat/db/migrations/0003_event_log.sql:
CREATE TABLE event_log (
id INTEGER PRIMARY KEY,
branch_id INTEGER NOT NULL DEFAULT 1,
ts TEXT NOT NULL DEFAULT (datetime('now')),
kind TEXT NOT NULL,
payload_json TEXT NOT NULL,
superseded_by INTEGER REFERENCES event_log(id),
hidden INTEGER NOT NULL DEFAULT 0
);
CREATE INDEX idx_event_log_branch_kind ON event_log(branch_id, kind);
chat/eventlog/log.py:
from __future__ import annotations
import json
from dataclasses import dataclass
from typing import Any, Iterator
from sqlite3 import Connection
@dataclass
class Event:
id: int
branch_id: int
ts: str
kind: str
payload: dict[str, Any]
superseded_by: int | None
hidden: bool
def append_event(conn: Connection, *, kind: str, payload: dict[str, Any], branch_id: int = 1) -> int:
cur = conn.execute(
"INSERT INTO event_log (branch_id, kind, payload_json) VALUES (?, ?, ?)",
(branch_id, kind, json.dumps(payload)),
)
return cur.lastrowid
def read_events(conn: Connection, branch_id: int = 1, after_id: int = 0) -> Iterator[Event]:
cur = conn.execute(
"SELECT id, branch_id, ts, kind, payload_json, superseded_by, hidden "
"FROM event_log WHERE branch_id = ? AND id > ? AND hidden = 0 "
"AND superseded_by IS NULL ORDER BY id",
(branch_id, after_id),
)
for row in cur:
yield Event(
id=row[0], branch_id=row[1], ts=row[2], kind=row[3],
payload=json.loads(row[4]), superseded_by=row[5], hidden=bool(row[6]),
)
chat/eventlog/projector.py:
from __future__ import annotations
from collections.abc import Callable
from sqlite3 import Connection
from .log import Event, read_events
Handler = Callable[[Connection, Event], None]
_REGISTRY: dict[str, Handler] = {}
def on(kind: str):
def deco(fn: Handler) -> Handler:
_REGISTRY[kind] = fn
return fn
return deco
def project(conn: Connection, branch_id: int = 1) -> None:
for event in read_events(conn, branch_id=branch_id):
h = _REGISTRY.get(event.kind)
if h:
h(conn, event)
def apply_event(conn: Connection, event: Event) -> None:
h = _REGISTRY.get(event.kind)
if h:
h(conn, event)
Step 4: Run test to verify it passes
pytest tests/test_eventlog.py -v
Expected: 1 passed.
Step 5: Commit
git add chat/eventlog/ chat/db/migrations/0003_event_log.sql tests/test_eventlog.py
git commit -m "feat: append-only event log with projector skeleton"
Task 6: Bot + You entity schemas and events
Adds bots and you_entity projected tables, bot_authored and you_authored event kinds. Identity is immutable per session — re-authoring writes a new event.
Files:
- Create:
chat/db/migrations/0004_entities.sql - Create:
chat/state/__init__.py - Create:
chat/state/entities.py - Modify:
chat/eventlog/projector.py(import handlers) - Create:
tests/test_entities.py
Step 1: Write the failing test
# tests/test_entities.py
from chat.db.migrate import apply_migrations
from chat.db.connection import open_db
from chat.eventlog.log import append_event
from chat.eventlog.projector import project
from chat.state.entities import get_bot, list_bots, get_you
import chat.state.entities # registers handlers
def test_bot_authored_creates_bot_row(tmp_path):
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
append_event(conn, kind="bot_authored", payload={
"id": "bot_a", "name": "BotA",
"persona": "...", "voice_samples": ["sample"], "traits": ["shy"],
"backstory": "...",
"initial_relationship_to_you": "coworker",
"kickoff_prose": "you stay late",
})
project(conn)
bot = get_bot(conn, "bot_a")
assert bot is not None
assert bot["name"] == "BotA"
assert bot["traits"] == ["shy"]
assert "bot_a" in [b["id"] for b in list_bots(conn)]
def test_you_authored_creates_you_singleton(tmp_path):
db = tmp_path / "t.db"
apply_migrations(db)
with open_db(db) as conn:
append_event(conn, kind="you_authored", payload={
"name": "Me", "pronouns": "they/them", "persona": "engineer",
})
project(conn)
you = get_you(conn)
assert you is not None
assert you["name"] == "Me"
Step 2: Run, verify fail.
Step 3: Implementation.
chat/db/migrations/0004_entities.sql:
CREATE TABLE bots (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
persona TEXT NOT NULL,
voice_samples_json TEXT NOT NULL DEFAULT '[]',
traits_json TEXT NOT NULL DEFAULT '[]',
backstory TEXT NOT NULL DEFAULT '',
initial_relationship_to_you TEXT NOT NULL DEFAULT '',
kickoff_prose TEXT NOT NULL DEFAULT '',
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE you_entity (
id INTEGER PRIMARY KEY CHECK (id = 1),
name TEXT NOT NULL,
pronouns TEXT NOT NULL DEFAULT '',
persona TEXT NOT NULL DEFAULT ''
);
chat/state/entities.py:
from __future__ import annotations
import json
from sqlite3 import Connection
from chat.eventlog.projector import on
from chat.eventlog.log import Event
@on("bot_authored")
def _apply_bot_authored(conn: Connection, e: Event) -> None:
p = e.payload
conn.execute(
"INSERT OR REPLACE INTO bots "
"(id, name, persona, voice_samples_json, traits_json, backstory, "
" initial_relationship_to_you, kickoff_prose) "
"VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
(p["id"], p["name"], p["persona"],
json.dumps(p.get("voice_samples", [])),
json.dumps(p.get("traits", [])),
p.get("backstory", ""),
p.get("initial_relationship_to_you", ""),
p.get("kickoff_prose", "")),
)
@on("you_authored")
def _apply_you_authored(conn: Connection, e: Event) -> None:
p = e.payload
conn.execute(
"INSERT OR REPLACE INTO you_entity (id, name, pronouns, persona) VALUES (1, ?, ?, ?)",
(p["name"], p.get("pronouns", ""), p.get("persona", "")),
)
def get_bot(conn: Connection, bot_id: str) -> dict | None:
row = conn.execute("SELECT * FROM bots WHERE id = ?", (bot_id,)).fetchone()
if not row:
return None
cols = [c[1] for c in conn.execute("PRAGMA table_info(bots)").fetchall()]
d = dict(zip(cols, row))
d["voice_samples"] = json.loads(d.pop("voice_samples_json"))
d["traits"] = json.loads(d.pop("traits_json"))
return d
def list_bots(conn: Connection) -> list[dict]:
cur = conn.execute("SELECT id, name FROM bots ORDER BY name")
return [{"id": r[0], "name": r[1]} for r in cur]
def get_you(conn: Connection) -> dict | None:
row = conn.execute("SELECT name, pronouns, persona FROM you_entity WHERE id = 1").fetchone()
if not row:
return None
return {"name": row[0], "pronouns": row[1], "persona": row[2]}
Step 4: Run, verify pass.
Step 5: Commit.
git add chat/db/migrations/0004_entities.sql chat/state/ tests/test_entities.py
git commit -m "feat: bot and you entity schemas with projector handlers"
Task 7: Edges schema + per-turn deltas
Per requirements §3.4. Edges table holds per-pair directed state. edge_update event applies deltas (affinity, trust, knowledge_facts, last_interaction). Summary rewrites are a separate event kind written at scene close (T27).
Files:
- Create:
chat/db/migrations/0005_edges.sql - Create:
chat/state/edges.py - Create:
tests/test_edges.py
Test sketch:
def test_edge_update_applies_affinity_delta(tmp_path):
# bot_authored, you_authored, then edge_update with affinity_delta=+5
# assert edges row exists with affinity=initial+5
Implementation sketch:
CREATE TABLE edges (
id INTEGER PRIMARY KEY,
chat_id TEXT, -- null for default initial seed
source_id TEXT NOT NULL,
target_id TEXT NOT NULL,
affinity INTEGER NOT NULL DEFAULT 50,
trust INTEGER NOT NULL DEFAULT 50,
summary TEXT NOT NULL DEFAULT '',
knowledge_json TEXT NOT NULL DEFAULT '[]',
last_interaction_chat_id TEXT,
last_interaction_at TEXT,
UNIQUE (source_id, target_id)
);
@on("edge_update")
def _apply_edge_update(conn, e):
p = e.payload
# upsert + apply deltas; clamp affinity/trust to 0..100
# append knowledge_facts if any
# bump last_interaction fields
Commit: feat: directed edges with per-turn delta projector
Task 8: Memory schema + witness flag
Memories are bot-owned. Witnessed-by mask stored per memory.
Files:
- Create:
chat/db/migrations/0006_memories.sql - Create:
chat/state/memory.py - Create:
tests/test_memory.py
Schema:
CREATE TABLE memories (
id INTEGER PRIMARY KEY,
owner_id TEXT NOT NULL, -- bot id whose POV this is
chat_id TEXT NOT NULL,
scene_id INTEGER,
pov_summary TEXT NOT NULL,
witness_you INTEGER NOT NULL,
witness_host INTEGER NOT NULL,
witness_guest INTEGER NOT NULL,
chat_clock_at TEXT,
source TEXT, -- e.g. "direct" | "told_by:bot_id"
reliability REAL NOT NULL DEFAULT 1.0,
significance INTEGER NOT NULL DEFAULT 1,
pinned INTEGER NOT NULL DEFAULT 0,
auto_pinned INTEGER NOT NULL DEFAULT 0,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_memories_owner ON memories(owner_id);
-- FTS5 index on pov_summary, scoped by owner_id
CREATE VIRTUAL TABLE memories_fts USING fts5(
pov_summary, content='memories', content_rowid='id'
);
CREATE TRIGGER memories_ai AFTER INSERT ON memories BEGIN
INSERT INTO memories_fts(rowid, pov_summary) VALUES (new.id, new.pov_summary);
END;
CREATE TRIGGER memories_au AFTER UPDATE ON memories BEGIN
INSERT INTO memories_fts(memories_fts, rowid, pov_summary)
VALUES('delete', old.id, old.pov_summary);
INSERT INTO memories_fts(rowid, pov_summary) VALUES (new.id, new.pov_summary);
END;
CREATE TRIGGER memories_ad AFTER DELETE ON memories BEGIN
INSERT INTO memories_fts(memories_fts, rowid, pov_summary)
VALUES('delete', old.id, old.pov_summary);
END;
memory_written event handler + helper functions:
@on("memory_written")
def _apply_memory_written(conn, e): ...
def get_pinned(conn, owner_id) -> list[dict]: ...
def search_memories(conn, owner_id: str, witness_role: str, query: str, k: int = 4) -> list[dict]:
"""FTS5 search filtered by witness bit. witness_role in {'you','host','guest'}."""
Tests: write a memory event with witness [1,1,0], assert search returns it for owner; assert search filtered by witness_guest=1 excludes it.
Commit: feat: memory schema with witness flags and FTS5 index
Task 9: Activity, container, scene, chat schemas
Adds the per-chat structural tables: chats, chat_state, containers, scenes, activity. Plus event handlers for chat_created, container_created, activity_change, scene_opened, scene_closed.
Files:
- Create:
chat/db/migrations/0007_world.sql - Create:
chat/state/world.py - Create:
tests/test_world.py
Schema (key columns):
CREATE TABLE chats (
id TEXT PRIMARY KEY, -- e.g. "chat_botA"
host_bot_id TEXT NOT NULL,
guest_bot_id TEXT, -- null when no guest
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE chat_state (
chat_id TEXT PRIMARY KEY,
time TEXT NOT NULL, -- ISO 8601 UTC
weather TEXT NOT NULL DEFAULT '',
active_scene_id INTEGER,
narrative_anchor TEXT -- the in-fiction "Day 1 = ..." reference
);
CREATE TABLE containers (
id INTEGER PRIMARY KEY,
chat_id TEXT NOT NULL,
name TEXT NOT NULL,
type TEXT NOT NULL,
properties_json TEXT NOT NULL DEFAULT '{}',
parent_id INTEGER REFERENCES containers(id)
);
CREATE TABLE scenes (
id INTEGER PRIMARY KEY,
chat_id TEXT NOT NULL,
container_id INTEGER REFERENCES containers(id),
started_at TEXT NOT NULL,
ended_at TEXT,
significance INTEGER NOT NULL DEFAULT 0,
participants_json TEXT NOT NULL DEFAULT '[]'
);
CREATE TABLE activity (
entity_id TEXT PRIMARY KEY, -- "you" or bot_id
container_id INTEGER REFERENCES containers(id),
slot TEXT,
posture TEXT NOT NULL DEFAULT '',
action_json TEXT NOT NULL DEFAULT '{}',
attention TEXT NOT NULL DEFAULT '',
holding_json TEXT NOT NULL DEFAULT '[]',
status_json TEXT NOT NULL DEFAULT '{}',
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
Handlers: chat_created, container_created, activity_change, scene_opened, scene_closed.
Tests: create chat → chat_state initialized; create container; activity_change updates activity row.
Commit: feat: chats, chat_state, containers, scenes, activity tables
Phase 1C: Authoring
Task 10: Kickoff prose parser
Classifier call that converts authored kickoff prose into structured {container, activity_per_entity, edge_seed} for confirmation.
Files:
- Create:
chat/services/kickoff.py - Create:
tests/test_kickoff.py
Schema returned by classifier:
class KickoffParse(BaseModel):
container_name: str
container_type: str
container_properties: dict # moving, public, audible_range
you_activity: ActivityShape
bot_activity: ActivityShape
initial_time_iso: str
edge_seed_summary: str
edge_seed_knowledge_facts: list[str]
class ActivityShape(BaseModel):
posture: str
action_verb: str
action_interruptible: bool
action_required_attention: str # low|medium|high
action_expected_duration: str
attention: str = ""
holding: list[str] = []
Implementation: call classify(...) with a prompt that includes the bot's persona + relationship-to-you + kickoff prose. Return the parsed model.
Test: mock client returns canned JSON; assert structured fields populate.
Commit: feat: kickoff prose parser via classifier
Task 11: Bot authoring page
Form-based authoring UI; on submit, validates and writes bot_authored event. After save, redirects to kickoff parse-and-confirm (T13).
Files:
- Create:
chat/templates/base.html - Create:
chat/templates/bot_form.html - Create:
chat/web/__init__.py - Create:
chat/web/bots.py - Modify:
chat/app.py(mount router, jinja env, static files) - Create:
chat/static/app.css - Create:
tests/test_bot_authoring.py
Test: POST to /bots/new with form fields; assert bot_authored event appended and bot row exists; response redirects to /bots/<id>/kickoff.
Implementation note: form fields map to identity per §5.1 (name, persona, voice_samples textarea split on ---, traits comma-separated, backstory, initial relationship to you, kickoff prose).
Commit: feat: bot authoring form with bot_authored event
Task 12: You-entity authoring (Settings page)
Single-row form for the "you" entity. Lives at /settings. POST writes you_authored event.
Files:
- Create:
chat/templates/settings.html - Create:
chat/web/settings.py - Modify:
chat/app.py - Create:
tests/test_settings.py
Commit: feat: settings page with you-entity authoring
Task 13: Kickoff parse-and-confirm flow
After bot authoring, the user lands on /bots/<id>/kickoff which shows the parsed kickoff in editable form. On confirm: append chat_created, container_created, activity_change (per entity), scene_opened, and an initial edge_update (the seed).
Files:
- Create:
chat/templates/kickoff_confirm.html - Create:
chat/web/kickoff.py - Create:
tests/test_kickoff_confirm.py
Test: Submit a confirmed kickoff payload; assert chat exists, chat_state has time, container exists, activity rows present for you + bot, scene is open, edge has seed summary.
Commit: feat: kickoff parse-and-confirm flow with chat creation
Phase 1D: Chat — single bot
Task 14: Top-level nav + Chat list
Persistent left rail with three sections (§16.1). Chat list pulls from chats joined with chat_state and the latest assistant_turn for snippet.
Files:
- Create:
chat/templates/layout.html(extends base, adds rail) - Create:
chat/templates/chat_list.html - Create:
chat/templates/bot_list.html - Create:
chat/web/nav.py - Modify:
chat/app.py - Create:
tests/test_chat_list.py
Commit: feat: top-level nav and chat list view
Task 15: Chat shell page
/chats/<id> — renders the empty timeline + input box + drawer toggle. No turn handling yet.
Files:
- Create:
chat/templates/chat.html - Create:
chat/web/chat.py - Create:
tests/test_chat_shell.py
Commit: feat: chat shell page rendering
Task 16: Per-chat SSE channel + multi-tab sync
In-process pub/sub: one asyncio.Queue per chat_id, broadcasting events to all subscribers. Endpoint /chats/<id>/events SSE-streams a JSON event stream. On connect, server pushes a snapshot event with current state; subsequent state changes push event items.
Files:
- Create:
chat/web/sse.py - Create:
chat/web/pubsub.py - Modify:
chat/web/chat.py - Create:
tests/test_sse.py
Test: TestClient streams 1 event; assert framing is correct (event: snapshot\ndata: {...}\n\n).
Commit: feat: per-chat SSE channel and pub/sub
Task 17: Turn input parser
Classifier call that splits a user turn into [dialogue|action|ooc] segments. OOC segments stripped from prompt; flagged for transcript display only.
Files:
- Create:
chat/services/turn_parse.py - Create:
tests/test_turn_parse.py
Schema:
class TurnSegment(BaseModel):
kind: str # dialogue|action|ooc
text: str
class ParsedTurn(BaseModel):
segments: list[TurnSegment]
Test: input *walks over* "Hey." ((player note)) → 3 segments tagged correctly. Mock classifier returns canned JSON.
Commit: feat: turn input parser via classifier
Task 18: Prompt assembly with trim tiers
Implements the must/should/nice trimming tiers (§3.2) for the narrative prompt. Token-counts via tiktoken. Inputs: speaker_id, current chat state, witnessed memories (top-K), recent dialogue, edges, activity for all present, active scene.
Files:
- Create:
chat/services/prompt.py - Create:
tests/test_prompt.py
Test: stuff a huge dialogue history, assert older turns get summarized first (NICE), then memories drop to K=2, etc. Must-include never trimmed.
Commit: feat: prompt assembly with must/should/nice trim tiers
Task 19: Narrative call + streaming over SSE
POST /chats/<id>/turns accepts a user prose turn. Server:
- Appends
user_turnevent (raw + parsed segments). - Appends a placeholder
assistant_turn_startedevent. - Streams narrative tokens over the chat's SSE channel as they arrive.
- On stream complete: appends
assistant_turnevent with full text +truncated=False. - On stream interrupt: appends
assistant_turnwithtruncated=True.
Files:
- Create:
chat/web/turns.py - Modify:
chat/web/sse.py(add token broadcast) - Modify:
chat/eventlog/log.py(add helpers if needed) - Create:
tests/test_turn_flow.py
Test (uses MockLLMClient): POST a turn → assert SSE channel emits token chunks then a final assistant_turn event; DB has both events.
Commit: feat: narrative streaming via SSE with assistant_turn event
Phase 1E: State updates per turn
Task 20: Post-turn state-update pass
After narrative completes, classifier extracts affinity_delta, trust_delta, knowledge_facts per (source, target) directed pair, for every present entity (silent witnesses too). Emits edge_update events.
Files:
- Create:
chat/services/state_update.py - Create:
tests/test_state_update.py
Test: mock returns deltas; assert edge_update events appended; projection updates affinity.
Commit: feat: post-turn state-update pass per present entity
Task 21: Memory write per turn
After narrative completes, write a memory row for each witness who's "owner" with appropriate witness flags. Phase 1 simplification: the memory's pov_summary is the assistant's narrative text snippet (significance default 1; classifier rewrites at scene close into per-POV summary form). Emits memory_written events.
Files:
- Create:
chat/services/memory_write.py - Create:
tests/test_memory_write.py
Commit: feat: per-turn memory writes with witness flags
Task 22: Significance pass (queued, async)
Background task: after narrative completes, runs significance classifier (0–3 per §11.1) on the turn. Updates the just-written memory's significance. Auto-pins on score 3 (with the soft-cap eviction rule from §8.5).
Files:
- Create:
chat/services/significance.py - Create:
chat/services/background.py(asyncio queue worker) - Modify:
chat/app.py(lifespan starts/stops worker) - Create:
tests/test_significance.py
Test: queue a significance job for a freshly-written memory; assert significance updates and auto-pin behavior on score 3.
Commit: feat: async significance pass with auto-pin on score 3
Task 23: Memory retrieval (FTS5, witness-filtered, top-K)
Implements search_memories(owner_id, witness_role, query, k) via FTS5 with WHERE filter on the witness column. Recency + significance boost in ranking.
Files:
- Modify:
chat/state/memory.py - Create:
tests/test_memory_search.py
Test: seed memories with mixed witness flags; assert filter excludes non-witnessed; assert recency boost orders newer above older.
Commit: feat: FTS5 memory retrieval with witness filter and ranking boosts
Phase 1F: Drawer & state ops
Task 24: Drawer read-only skeleton
Right-side drawer rendered as a partial; HTMX-loaded into the chat page. Shows current scene, container, activity per entity, edges (host ↔ you), recent witnessed memories with significance markers, pinned memories with n/8 counter.
Files:
- Create:
chat/templates/drawer.html - Create:
chat/web/drawer.py - Modify:
chat/templates/chat.html(drawer toggle + container) - Modify:
chat/static/app.css - Create:
tests/test_drawer_render.py
Commit: feat: read-only drawer with scene, activity, edges, memories
Task 25: Drawer edits (activity / edges / memory)
Inline edit affordances on activity, edge fields, memory pov_summary/significance/pin. Each edit emits a manual_edit event with prior value snapshotted (per §6.4 final paragraph). Pin toggle emits memory_pin_changed event.
Files:
- Modify:
chat/web/drawer.py - Modify:
chat/templates/drawer.html - Create:
chat/state/manual_edit.py(handler formanual_editevent) - Create:
tests/test_drawer_edits.py
Test: edit affinity slider via POST; assert manual_edit event written with prior + new value; projected affinity updated.
Commit: feat: drawer edits with manual_edit event capture
Task 26: Scene close (hard signals + manual button)
Hard-signal detection runs as a small classifier call after each turn (queued/cheap): does the prose indicate container change, explicit "we're done here" pattern, or other hard signal? Manual close button in drawer always available. On close, emit scene_closed event; reopen via scene_opened for the new scene.
Files:
- Create:
chat/services/scene_close.py - Modify:
chat/web/turns.py - Modify:
chat/web/drawer.py(manual close button) - Create:
tests/test_scene_close.py
Test: simulate prose "we drove to the park"; assert classifier returns container_change=true; assert scene_closed then scene_opened events written.
Commit: feat: scene close on hard signals with manual override
Task 27: Per-POV summary on close
On scene_closed, classifier writes a per-POV summary for each present witness (Phase 1: just the host bot since we're single-bot). Updates the existing memory rows for that scene, replacing terse pov_summary with a proper scene-level summary. Updates edge summary from the per-POV summary + prior summary. Promotion rules apply (§11.3).
Files:
- Create:
chat/services/scene_summarize.py - Modify:
chat/eventlog/projector.pyif needed for scene_closed handler - Create:
tests/test_per_pov_summary.py
Commit: feat: per-POV summary and edge summary update on scene close
Phase 1G: Rollback
Task 28: Rewind UI + impact preview + pre-rewind snapshot
"Rewind to here" button on each turn in the chat. Computes impact preview (count messages, scene transitions, edge updates, memories, fired events affected). Pre-rewind snapshot written to data/snapshots/rewind/. On confirm: truncate event_log past selected event, drop projected tables, replay events up to selected. 30-second undo toast.
Files:
- Create:
chat/services/rewind.py - Create:
chat/services/snapshot.py - Create:
chat/templates/rewind_modal.html - Modify:
chat/web/turns.py - Create:
tests/test_rewind.py
Test: play 5 turns; rewind to turn 2; assert events 3-5 removed, projected state matches state-at-turn-2, snapshot file exists.
Commit: feat: rewind with impact preview, pre-rewind snapshot, undo toast
Task 29: Regenerate (inline edit-then-regenerate)
Button on the latest assistant_turn. Click puts your prior user_turn into inline edit mode; submit either appends user_turn_edit (if edited) then a new assistant_turn, or just a new assistant_turn (if not edited). The previous assistant_turn is marked superseded_by the new one. Display hides superseded turns.
Files:
- Create:
chat/services/regenerate.py - Modify:
chat/web/turns.py - Modify:
chat/templates/chat.html(regenerate button + edit-state HTMX swaps) - Create:
tests/test_regenerate.py
Test: regenerate without edit → new assistant_turn, prior superseded, projected state reflects new only. With edit → also a user_turn_edit event.
Commit: feat: regenerate with edit-then-regenerate inline UX
Task 30: Reset bot (hard confirm)
/bots/<id>/reset → modal requiring you to type the bot's name. On confirm: emit bot_reset event. Handler purges the bot's chat_state, scenes, containers, activities, memories, edges-involving-this-bot. Identity, initial-relationship, kickoff prose preserved. Chat sits ready (no auto kickoff replay; next user message triggers it).
Files:
- Create:
chat/services/reset.py - Modify:
chat/web/bots.py - Modify:
chat/templates/bot_list.html(reset button) - Create:
tests/test_reset.py
Test: play, reset, assert all transient state for that bot is gone, identity remains.
Commit: feat: bot reset with hard confirm and event-driven purge
Phase 1H: Ops & polish
Task 31: Periodic snapshots
Every 100 events OR every 30 minutes since last snapshot, write a full-state JSON to data/snapshots/periodic/. Retain last 5. On cold load (app start), if a periodic snapshot exists, apply it then replay events past it.
Files:
- Modify:
chat/services/snapshot.py - Modify:
chat/services/background.py(periodic timer) - Create:
tests/test_snapshot.py
Commit: feat: periodic snapshots with retention and cold-load fast-path
Task 32: Nightly backups
Simple in-process scheduler: at 03:00 local time daily, copy chat.db to data/backups/chat-<timestamp>.db. Retain last 14. Suitable for v1; launchd plist can replace later.
Files:
- Create:
chat/services/backup.py - Modify:
chat/services/background.py - Create:
tests/test_backup.py
Commit: feat: nightly DB backups with 14-day retention
Task 33: Display formatting
Renderer for transcript turns. Lightweight markdown (paragraphs, italic, bold, blockquotes — no headings/code). *action* rendered as italic in narrative output. OOC ((parens)) rendered dimmed/italic/smaller, never sent to bot. Speaker labels bold.
Files:
- Create:
chat/web/render.py - Modify:
chat/templates/chat.html(use render filters) - Modify:
chat/static/app.css - Create:
tests/test_render.py
Test: input prose with all marker types → expected HTML output.
Commit: feat: transcript display formatting with markdown and OOC styling
Task 34: Streaming UX (typing indicator, Stop, mid-stream disconnect)
Stop button on streaming bot row aborts the in-flight Featherless request and commits partial as assistant_turn with truncated=true. SSE client handles disconnect: server detects channel close, commits whatever was streamed, surfaces "connection lost — partial response saved" banner with Regenerate button. Send button disabled while streaming.
Files:
- Modify:
chat/web/turns.py - Modify:
chat/templates/chat.html - Modify:
chat/static/app.css - Create:
tests/test_streaming_ux.py
Commit: feat: streaming UX with Stop, disconnect handling, send-lock
Task 35: Error UX banners + first-run flow
Error banners (per §16.5): Featherless 401/429/5xx surface inline with Retry. DB write failures show modal-blocking error. Schema migration failure on startup logs to stderr and exits non-zero.
First-run flow: if you_entity missing, redirect to /settings after first navigation. If bots empty, after settings save, redirect to /bots/new. After bot creation + kickoff confirm, land in chat.
Files:
- Create:
chat/web/middleware.py(first-run redirect) - Create:
chat/templates/errors.html - Modify:
chat/web/turns.py(catch Featherless errors) - Modify:
chat/app.py(mount middleware, error handlers) - Create:
tests/test_first_run.py - Create:
tests/test_error_ux.py
Commit: feat: error banners and first-run navigation flow
Wrap-up
After T35, run the full test suite and a manual smoke test:
pytest -v
uvicorn chat.app:app --reload
# In a browser: walk through first-run, author a bot with kickoff,
# play 10 turns, open the drawer, edit an edge, close a scene, rewind, regenerate.
# Open a second tab on the same chat, verify multi-tab sync.
Update CLAUDE.md to reflect the v1 surface that actually shipped (any tasks deferred to Phase 1.5, any choices that shifted during implementation).
Merge phase-1 into main with a single squash commit referencing this plan.
Notes for the executor
- Verify before claiming done (
superpowers-extended-cc:verification-before-completion): every task ends with running its test command and reading the output. "Tests should pass" is not enough; show the green output. - DRY ruthlessly but don't pre-extract: if two tasks need similar code, inline both first, then refactor in a third commit. Premature abstraction breaks the TDD rhythm.
- YAGNI: don't pre-build for Phase 2 (multi-bot, guests, group node) until those tasks exist.
- Frequent commits: one per task minimum, more if a task naturally splits.
- Don't bypass the event log. Any state change goes through an event. If a test wants to seed state directly, it's still appending events and projecting — not
INSERT INTO botsdirectly. (Exception: schema migrations themselves.) - API key safety: never log the Featherless API key, never write it to event payloads, never include it in error messages.