Phase 1: v1 single-bot roleplay engine #1

2026-04-26T14:40:25-04:00

dohertj2 commented

2026-04-26 14:40:25 -04:00

Summary

Phase 1 of the Roleplay Engine, end-to-end. 38 commits, 168 tests passing, schema version 7.

The single-bot core loop is functional: bot authoring with kickoff parse-and-confirm, streaming turns over SSE with multi-tab sync, drawer rendering with edit affordances, scene-close detection with per-POV summary rewrites, edge updates per turn, FTS5 retrieval with witness filter + ranking boosts, periodic snapshots + nightly backups, rewind/regenerate/reset, first-run navigation, friendly 404/500 pages.

What's in this PR

Phase 1A (T0–T4): foundation — project skeleton, config loader, SQLite migrations, LLM client (Featherless + Mock), classifier wrapper.
Phase 1B (T5–T9): event log + projector, bot/you entities, edges, memory + FTS5, world tables.
Phase 1C (T10–T13): kickoff prose parser, bot authoring form, settings (you-entity), kickoff parse-and-confirm.
Phase 1D (T14–T19): top-level nav, chat shell, SSE channel, turn parser, prompt assembly with trim tiers, narrative streaming.
Phase 1E (T20–T23): post-turn state-update pass + append_and_apply, per-turn memory writes, async significance pass + auto-pin, ranked retrieval.
Phase 1F (T24–T27): drawer + drawer edits + scene close + per-POV summary on close.
Phase 1G (T28–T30): rewind w/ snapshot, regenerate, bot reset.
Phase 1H (T31–T35): periodic snapshots + cold-load, nightly backups, display formatting, streaming UX, first-run + friendly errors.
Cleanup (365dacc): *.egg-info/ gitignored, pip install -e . packaging fix, Phase 1.5 backlog documented in CLAUDE.md.

Design references

Architecture: rp-engine-design.md
v1 requirements: docs/plans/2026-04-26-v1-requirements-design.md
Implementation plan: docs/plans/2026-04-26-v1-phase1-implementation.md

Test plan

.venv/bin/pytest — 168 passed
.venv/bin/pip install -e . — succeeds (post-cleanup)
Manual smoke: copy data/config.example.toml → data/config.toml with real Featherless key, uvicorn chat.app:app --reload, walk through first-run (settings → bot author → kickoff confirm → chat), play 5 turns, open drawer, edit affinity, close scene, rewind, regenerate, multi-tab sync.
Manual smoke: reset a bot via the bot-list reset form.

Phase 1.5 cleanup backlog

Documented in CLAUDE.md. None blocking. Track via that file rather than separate issues.

Out of scope (Phase 2+)

Multi-bot scenes (guest, group node, scene configs)
Time skips, events with lifecycles, active threads
Vector retrieval, branching UI, surgical delete

## Summary Phase 1 of the Roleplay Engine, end-to-end. **38 commits, 168 tests passing, schema version 7.** The single-bot core loop is functional: bot authoring with kickoff parse-and-confirm, streaming turns over SSE with multi-tab sync, drawer rendering with edit affordances, scene-close detection with per-POV summary rewrites, edge updates per turn, FTS5 retrieval with witness filter + ranking boosts, periodic snapshots + nightly backups, rewind/regenerate/reset, first-run navigation, friendly 404/500 pages. ## What's in this PR - **Phase 1A** (T0–T4): foundation — project skeleton, config loader, SQLite migrations, LLM client (Featherless + Mock), classifier wrapper. - **Phase 1B** (T5–T9): event log + projector, bot/you entities, edges, memory + FTS5, world tables. - **Phase 1C** (T10–T13): kickoff prose parser, bot authoring form, settings (you-entity), kickoff parse-and-confirm. - **Phase 1D** (T14–T19): top-level nav, chat shell, SSE channel, turn parser, prompt assembly with trim tiers, narrative streaming. - **Phase 1E** (T20–T23): post-turn state-update pass + `append_and_apply`, per-turn memory writes, async significance pass + auto-pin, ranked retrieval. - **Phase 1F** (T24–T27): drawer + drawer edits + scene close + per-POV summary on close. - **Phase 1G** (T28–T30): rewind w/ snapshot, regenerate, bot reset. - **Phase 1H** (T31–T35): periodic snapshots + cold-load, nightly backups, display formatting, streaming UX, first-run + friendly errors. - **Cleanup** (`365dacc`): `*.egg-info/` gitignored, `pip install -e .` packaging fix, Phase 1.5 backlog documented in CLAUDE.md. ## Design references - Architecture: [rp-engine-design.md](rp-engine-design.md) - v1 requirements: [docs/plans/2026-04-26-v1-requirements-design.md](docs/plans/2026-04-26-v1-requirements-design.md) - Implementation plan: [docs/plans/2026-04-26-v1-phase1-implementation.md](docs/plans/2026-04-26-v1-phase1-implementation.md) ## Test plan - [x] `.venv/bin/pytest` — 168 passed - [x] `.venv/bin/pip install -e .` — succeeds (post-cleanup) - [ ] Manual smoke: copy `data/config.example.toml` → `data/config.toml` with real Featherless key, `uvicorn chat.app:app --reload`, walk through first-run (settings → bot author → kickoff confirm → chat), play 5 turns, open drawer, edit affinity, close scene, rewind, regenerate, multi-tab sync. - [ ] Manual smoke: reset a bot via the bot-list reset form. ## Phase 1.5 cleanup backlog Documented in [CLAUDE.md](CLAUDE.md). None blocking. Track via that file rather than separate issues. ## Out of scope (Phase 2+) - Multi-bot scenes (guest, group node, scene configs) - Time skips, events with lifecycles, active threads - Vector retrieval, branching UI, surgical delete

dohertj2 added 38 commits 2026-04-26 14:40:26 -04:00

feat: project skeleton with health endpoint 4a60171035

feat: config loader with toml + env override 01e6975d20

feat: sqlite migration runner with meta version table 67517926aa

feat: LLMClient protocol with Featherless and mock implementations e627356168

feat: classifier wrapper with retry, timeout, schema-default fallback c2aceffda1

feat: append-only event log with projector skeleton 517fe49aef

feat: bot and you entity schemas with projector handlers 5e6bbb586c

docs: fix Task 6 plan snippet: PRAGMA table_info name index is c[1] not c[0] 7e6c2985dd

feat: directed edges with per-turn delta projector bc97d425ef

feat: memory schema with witness flags and FTS5 index 30e6648122

feat: chats, chat_state, containers, scenes, activity tables ec344064f1

feat: kickoff prose parser via classifier a5339fc1d2

feat: bot authoring form with bot_authored event 44ea627a8a

feat: settings page with you-entity authoring e44e2bf93f

feat: kickoff parse-and-confirm flow with chat creation fbb16c86b3

feat: top-level nav and chat list view 0c08745194

feat: chat shell page rendering e79f4d8d22

feat: per-chat SSE channel and pub/sub 656c2558cb

feat: turn input parser via classifier a0f5e818ec

feat: prompt assembly with must/should/nice trim tiers 73d8b0c092

feat: narrative streaming via SSE with assistant_turn event 9b45710cb1

feat: post-turn state-update pass per present entity e8d24a0875

feat: per-turn memory writes with witness flags a45dabb6ae

feat: async significance pass with auto-pin on score 3 eb4cdf9cbb

feat: FTS5 memory retrieval with witness filter and ranking boosts 3995a8671b

feat: read-only drawer with scene, activity, edges, memories 5fc5b8ac23

feat: drawer edits with manual_edit event capture db3005fc17

feat: scene close on hard signals with manual override 0997562e75

feat: per-POV summary and edge summary update on scene close b5175aefaa

feat: rewind with impact preview, pre-rewind snapshot, undo toast aa0563b4fa

feat: regenerate with edit-then-regenerate inline UX 46062973c2

feat: bot reset with hard confirm and event-driven purge 82be8b3f51

feat: periodic snapshots with retention and cold-load fast-path b9644fad31

feat: nightly DB backups with 14-day retention 8390703b73

feat: transcript display formatting with markdown and OOC styling 330077afcf

feat: streaming UX with Stop, disconnect handling, send-lock 0353d592cd

feat: error banners and first-run navigation flow a302ed427a

chore: post-Phase-1 cleanup — gitignore, packaging, backlog 365dacc0d0

- .gitignore: add *.egg-info/ so editable installs don't show in git status.
- pyproject.toml: add [build-system] and [tool.setuptools.packages.find]
  scoped to chat*, fixing pip install -e . which was failing on data/
  auto-discovery.
- CLAUDE.md: add Phase 1.5 cleanup backlog section under Phase 1 status,
  capturing the small follow-ups surfaced in implementer reviews
  (open_db refactor, regenerate SSE broadcast, you-activity purge,
  drawer edits for deferred fields, NICE trim order).

dohertj2 added 1 commit 2026-04-26 14:50:10 -04:00

chore: add scripts/seed_sample_bots.py 12502d6ec7

Idempotent seeder for three sample bots (Maya — coworker slow-burn,
Eli — live-in partner, Sam — bartender / new connection). Each is a
distinct relational archetype to exercise the system from different
angles. Run from repo root:

    .venv/bin/python scripts/seed_sample_bots.py

Re-running skips ids that already exist. After seeding, walk each bot
through kickoff parse-and-confirm at /bots/<id>/kickoff.

dohertj2 added 1 commit 2026-04-26 15:03:17 -04:00

fix: classifier robustness — schema in prompt, retries, kickoff fallback 5aab98e4d7

The kickoff parse-and-confirm route was 500-ing intermittently because
Hermes-3 + Featherless's response_format={"type":"json_object"} only
guarantees JSON output, NOT a particular schema. The model was inventing
its own field names (sceneTime, entities, settingDetails) instead of
the KickoffParse fields, causing Pydantic validation to fail on both
classify() retries.

Three changes:

1. Include the Pydantic JSON schema in the system prompt so the model
   knows exactly which keys to produce. Affects every classify() call
   (kickoff parse, turn parse, scene-close detect, significance,
   state-update, scene summarize). Strip ```json fences if the model
   wraps its output. Bump retries 2 → 3 (model is stochastic; one extra
   attempt closes most of the remaining gap).

2. parse_kickoff() now passes a default empty KickoffParse so the
   route degrades to a fillable form instead of 500 when the classifier
   ultimately fails. The confirm form is the human-in-the-loop; an
   empty form is strictly better UX than a stack trace.

3. Tests updated: bumped canned-failure arrays from 2 → 3 entries to
   match the new attempt count; renamed kickoff test from
   "raises_when_classifier_fails_twice" to
   "falls_back_to_empty_when_classifier_fails" reflecting the new
   degraded-but-usable behavior.

Verified live with all 3 sample bots (maya/eli/sam) — kickoff route
returns 200 across multiple attempts. Full suite: 168 passed.

dohertj2 added 1 commit 2026-04-26 15:15:16 -04:00

fix: classifier timeout + Featherless concurrency cap 5c039c8e56

Two related issues blocking real-world use of the kickoff parse:

1. Classifier calls take ~12s end-to-end on Featherless for the
   complex KickoffParse schema (Hermes-3-8B generating ~1.3KB of
   structured JSON). The 10s timeout was firing on most attempts,
   causing all 3 retries to time out and the empty-fallback to render
   with blank form values. Bumping the default
   classifier_timeout_s 10 → 30s gives generous headroom; measured
   p99 is ~13s, so 30s is comfortable.

2. Featherless caps concurrent connections per account (2 on free /
   lower paid tiers). Each turn flow can fire 4–5 calls (parse,
   scene-close detect, narrative stream, two state-update passes)
   plus the background significance worker. Without a gate, we'd
   exceed the cap and fail.

   Added a class-level ``asyncio.Semaphore`` to FeatherlessClient,
   shared across all instances, configured once in lifespan from
   ``Settings.featherless_max_concurrent`` (default 2). Both
   ``generate`` and ``stream`` acquire the semaphore for the duration
   of the call; the stream holds it until the async generator
   completes, so token streaming is correctly accounted for.

Verified live: 4/4 sequential kickoff parses for the same bot all
succeed with real parsed values (previously ~50% blank-fallback).
Full suite: 168 passed.

dohertj2 added 1 commit 2026-04-26 15:20:06 -04:00

fix: reject empty prose on turn submit 52555e0455

Empty submission was producing a blank user_turn event in the log and
firing the LLM stream anyway — the bot would invent a response from the
kickoff context alone, producing a monologue with no user input. Two-
layer fix:

- Browser: add `required` to the prose textarea in chat.html so the
  form refuses to submit empty.
- Server: 400 in post_turn when prose.strip() is empty. Defense in
  depth — if a client bypasses the textarea attribute (custom UI,
  curl, etc.), the server still rejects.

Verified live: POST with empty body returns 400; POST with whitespace-
only returns 400; chat shell renders the textarea with required.
Full suite: 168 passed.

dohertj2 added 1 commit 2026-04-26 15:23:10 -04:00

fix: use readOnly (not disabled) to lock textarea during stream f0742dd4f9

The form-submit handler in chat.html was setting
``textarea.disabled = true`` synchronously before the browser actually
serialized the form. Disabled form fields are excluded from
submission, so the request body contained ``prose=""`` even when the
user had typed text — which the server (correctly) rejected with the
new empty-prose 400. Net effect: typing "hello" + Send gave a "prose
cannot be empty" error.

Switched to ``readOnly``: same UX (user can't edit while streaming)
but the field IS submitted. The unlock path now also clears the
textarea and refocuses for the next turn.

dohertj2 added 1 commit 2026-04-26 15:28:11 -04:00

feat: cap narrative response length + tune sampling d161e7b8e9

Bot replies were running long (4 paragraphs of action+dialogue beats
per turn) because we never set max_tokens on the narrative call. Three
tunable knobs now in Settings (set in data/config.toml to override):

- narrative_max_tokens: int = 400
  Hard cap on each generated response. ~400 tokens ≈ 1–2 short
  paragraphs. Drop to 200 for terse banter, bump to 800+ for longer
  scenes.

- narrative_temperature: float = 0.85
  Sampling temperature. 0.7 = grounded/consistent (slightly stiff),
  0.85 = creative-but-in-character (default), 1.0 = wide variety,
  >1.0 = often off-the-rails.

- prompt closing instruction now nudges: "Keep your response to a
  single beat — one or two short paragraphs at most. Don't monologue;
  leave room for the other person to react."

Both turns.py (post_turn) and regenerate.py forward the params to
client.stream(). FeatherlessClient already passes **params through to
the OpenAI-compat endpoint.

Note: temperature doesn't control length — that was a common
misconception. max_tokens is the actual length cap. Lower temperature
makes word choice more predictable (slightly stiffer voice), not
shorter. Both knobs are useful for different goals.

dohertj2 added 1 commit 2026-04-26 15:37:11 -04:00

docs: add Phase 2 implementation plan with parallel-safe waves b8335895e1

13 tasks across 6 waves (1, 2, 3, 4a, 4b, 5). Designed for parallel
subagent execution where file-disjointness allows.

Waves 1, 2, 4a, and 5 each contain 2-3 tasks that touch disjoint files
and can be dispatched concurrently via the Agent tool with
isolation: "worktree". Waves 3 (drawer guest support) and 4b (multi-
entity turn flow) are single-task because they touch hot files
(_drawer.html, turns.py) that cannot be safely co-modified.

Plan covers:
- T36: group_node schema + handlers (new migration 0008)
- T37: guest_added / guest_removed event handlers (modifies world.py)
- T38: relationship-seed service ("have they met?")
- T39: interjection classifier service
- T40: multi-entity state-update coordinator (6 directed pairs)
- T41: multi-witness memory write helper
- T42: drawer guest add/remove UI + render
- T43: multi-entity prompt assembly (extends T18)
- T44: multi-entity turn flow (rewrites post_turn)
- T45: multi-entity per-POV summaries on scene close
- T46: witness filter cross-coverage tests
- T47: bot_reset cascades to guest references
- T48: Phase 2 documentation update

Plan also documents:
- Worktree-per-subagent dispatch pattern using Agent isolation flag
- Merge ordering per wave (file-disjointness = conflict-free merges)
- Failure recovery (cancel failed parallel task, re-dispatch as solo)
- Conflict prevention checklist (verify Files sections disjoint per wave)

Tasks file (.tasks.json) carries dependency DAG with `blockedBy` and
`parallelGroup` so a future executing-plans run can dispatch correctly.

NOT EXECUTING. Plan only.

dohertj2 referenced this pull request

2026-04-26 16:44:19 -04:00

Phase 2: multi-entity scene support (you + host + guest) #2

dohertj2 merged commit 3be7920f41 into main

2026-04-26 19:59:30 -04:00

dohertj2 referenced this issue from a commit

2026-04-26 19:59:31 -04:00

Merge pull request 'Phase 1: v1 single-bot roleplay engine' (#1) from phase-1 into main

dohertj2 referenced this issue from a commit

2026-04-27 14:26:32 -04:00

fix: chat UI — load htmx-ext-sse, render user prose optimistically, AJAX submit

Sign in to join this conversation.