Phase 1: v1 single-bot roleplay engine #1

Merged
dohertj2 merged 45 commits from phase-1 into main 2026-04-26 19:59:30 -04:00
Owner

Summary

Phase 1 of the Roleplay Engine, end-to-end. 38 commits, 168 tests passing, schema version 7.

The single-bot core loop is functional: bot authoring with kickoff parse-and-confirm, streaming turns over SSE with multi-tab sync, drawer rendering with edit affordances, scene-close detection with per-POV summary rewrites, edge updates per turn, FTS5 retrieval with witness filter + ranking boosts, periodic snapshots + nightly backups, rewind/regenerate/reset, first-run navigation, friendly 404/500 pages.

What's in this PR

  • Phase 1A (T0–T4): foundation — project skeleton, config loader, SQLite migrations, LLM client (Featherless + Mock), classifier wrapper.
  • Phase 1B (T5–T9): event log + projector, bot/you entities, edges, memory + FTS5, world tables.
  • Phase 1C (T10–T13): kickoff prose parser, bot authoring form, settings (you-entity), kickoff parse-and-confirm.
  • Phase 1D (T14–T19): top-level nav, chat shell, SSE channel, turn parser, prompt assembly with trim tiers, narrative streaming.
  • Phase 1E (T20–T23): post-turn state-update pass + append_and_apply, per-turn memory writes, async significance pass + auto-pin, ranked retrieval.
  • Phase 1F (T24–T27): drawer + drawer edits + scene close + per-POV summary on close.
  • Phase 1G (T28–T30): rewind w/ snapshot, regenerate, bot reset.
  • Phase 1H (T31–T35): periodic snapshots + cold-load, nightly backups, display formatting, streaming UX, first-run + friendly errors.
  • Cleanup (365dacc): *.egg-info/ gitignored, pip install -e . packaging fix, Phase 1.5 backlog documented in CLAUDE.md.

Design references

Test plan

  • .venv/bin/pytest — 168 passed
  • .venv/bin/pip install -e . — succeeds (post-cleanup)
  • Manual smoke: copy data/config.example.tomldata/config.toml with real Featherless key, uvicorn chat.app:app --reload, walk through first-run (settings → bot author → kickoff confirm → chat), play 5 turns, open drawer, edit affinity, close scene, rewind, regenerate, multi-tab sync.
  • Manual smoke: reset a bot via the bot-list reset form.

Phase 1.5 cleanup backlog

Documented in CLAUDE.md. None blocking. Track via that file rather than separate issues.

Out of scope (Phase 2+)

  • Multi-bot scenes (guest, group node, scene configs)
  • Time skips, events with lifecycles, active threads
  • Vector retrieval, branching UI, surgical delete
## Summary Phase 1 of the Roleplay Engine, end-to-end. **38 commits, 168 tests passing, schema version 7.** The single-bot core loop is functional: bot authoring with kickoff parse-and-confirm, streaming turns over SSE with multi-tab sync, drawer rendering with edit affordances, scene-close detection with per-POV summary rewrites, edge updates per turn, FTS5 retrieval with witness filter + ranking boosts, periodic snapshots + nightly backups, rewind/regenerate/reset, first-run navigation, friendly 404/500 pages. ## What's in this PR - **Phase 1A** (T0–T4): foundation — project skeleton, config loader, SQLite migrations, LLM client (Featherless + Mock), classifier wrapper. - **Phase 1B** (T5–T9): event log + projector, bot/you entities, edges, memory + FTS5, world tables. - **Phase 1C** (T10–T13): kickoff prose parser, bot authoring form, settings (you-entity), kickoff parse-and-confirm. - **Phase 1D** (T14–T19): top-level nav, chat shell, SSE channel, turn parser, prompt assembly with trim tiers, narrative streaming. - **Phase 1E** (T20–T23): post-turn state-update pass + `append_and_apply`, per-turn memory writes, async significance pass + auto-pin, ranked retrieval. - **Phase 1F** (T24–T27): drawer + drawer edits + scene close + per-POV summary on close. - **Phase 1G** (T28–T30): rewind w/ snapshot, regenerate, bot reset. - **Phase 1H** (T31–T35): periodic snapshots + cold-load, nightly backups, display formatting, streaming UX, first-run + friendly errors. - **Cleanup** (`365dacc`): `*.egg-info/` gitignored, `pip install -e .` packaging fix, Phase 1.5 backlog documented in CLAUDE.md. ## Design references - Architecture: [rp-engine-design.md](rp-engine-design.md) - v1 requirements: [docs/plans/2026-04-26-v1-requirements-design.md](docs/plans/2026-04-26-v1-requirements-design.md) - Implementation plan: [docs/plans/2026-04-26-v1-phase1-implementation.md](docs/plans/2026-04-26-v1-phase1-implementation.md) ## Test plan - [x] `.venv/bin/pytest` — 168 passed - [x] `.venv/bin/pip install -e .` — succeeds (post-cleanup) - [ ] Manual smoke: copy `data/config.example.toml` → `data/config.toml` with real Featherless key, `uvicorn chat.app:app --reload`, walk through first-run (settings → bot author → kickoff confirm → chat), play 5 turns, open drawer, edit affinity, close scene, rewind, regenerate, multi-tab sync. - [ ] Manual smoke: reset a bot via the bot-list reset form. ## Phase 1.5 cleanup backlog Documented in [CLAUDE.md](CLAUDE.md). None blocking. Track via that file rather than separate issues. ## Out of scope (Phase 2+) - Multi-bot scenes (guest, group node, scene configs) - Time skips, events with lifecycles, active threads - Vector retrieval, branching UI, surgical delete
dohertj2 added 38 commits 2026-04-26 14:40:26 -04:00
- .gitignore: add *.egg-info/ so editable installs don't show in git status.
- pyproject.toml: add [build-system] and [tool.setuptools.packages.find]
  scoped to chat*, fixing pip install -e . which was failing on data/
  auto-discovery.
- CLAUDE.md: add Phase 1.5 cleanup backlog section under Phase 1 status,
  capturing the small follow-ups surfaced in implementer reviews
  (open_db refactor, regenerate SSE broadcast, you-activity purge,
  drawer edits for deferred fields, NICE trim order).
dohertj2 added 1 commit 2026-04-26 14:50:10 -04:00
Idempotent seeder for three sample bots (Maya — coworker slow-burn,
Eli — live-in partner, Sam — bartender / new connection). Each is a
distinct relational archetype to exercise the system from different
angles. Run from repo root:

    .venv/bin/python scripts/seed_sample_bots.py

Re-running skips ids that already exist. After seeding, walk each bot
through kickoff parse-and-confirm at /bots/<id>/kickoff.
dohertj2 added 1 commit 2026-04-26 15:03:17 -04:00
The kickoff parse-and-confirm route was 500-ing intermittently because
Hermes-3 + Featherless's response_format={"type":"json_object"} only
guarantees JSON output, NOT a particular schema. The model was inventing
its own field names (sceneTime, entities, settingDetails) instead of
the KickoffParse fields, causing Pydantic validation to fail on both
classify() retries.

Three changes:

1. Include the Pydantic JSON schema in the system prompt so the model
   knows exactly which keys to produce. Affects every classify() call
   (kickoff parse, turn parse, scene-close detect, significance,
   state-update, scene summarize). Strip ```json fences if the model
   wraps its output. Bump retries 2 → 3 (model is stochastic; one extra
   attempt closes most of the remaining gap).

2. parse_kickoff() now passes a default empty KickoffParse so the
   route degrades to a fillable form instead of 500 when the classifier
   ultimately fails. The confirm form is the human-in-the-loop; an
   empty form is strictly better UX than a stack trace.

3. Tests updated: bumped canned-failure arrays from 2 → 3 entries to
   match the new attempt count; renamed kickoff test from
   "raises_when_classifier_fails_twice" to
   "falls_back_to_empty_when_classifier_fails" reflecting the new
   degraded-but-usable behavior.

Verified live with all 3 sample bots (maya/eli/sam) — kickoff route
returns 200 across multiple attempts. Full suite: 168 passed.
dohertj2 added 1 commit 2026-04-26 15:15:16 -04:00
Two related issues blocking real-world use of the kickoff parse:

1. Classifier calls take ~12s end-to-end on Featherless for the
   complex KickoffParse schema (Hermes-3-8B generating ~1.3KB of
   structured JSON). The 10s timeout was firing on most attempts,
   causing all 3 retries to time out and the empty-fallback to render
   with blank form values. Bumping the default
   classifier_timeout_s 10 → 30s gives generous headroom; measured
   p99 is ~13s, so 30s is comfortable.

2. Featherless caps concurrent connections per account (2 on free /
   lower paid tiers). Each turn flow can fire 4–5 calls (parse,
   scene-close detect, narrative stream, two state-update passes)
   plus the background significance worker. Without a gate, we'd
   exceed the cap and fail.

   Added a class-level ``asyncio.Semaphore`` to FeatherlessClient,
   shared across all instances, configured once in lifespan from
   ``Settings.featherless_max_concurrent`` (default 2). Both
   ``generate`` and ``stream`` acquire the semaphore for the duration
   of the call; the stream holds it until the async generator
   completes, so token streaming is correctly accounted for.

Verified live: 4/4 sequential kickoff parses for the same bot all
succeed with real parsed values (previously ~50% blank-fallback).
Full suite: 168 passed.
dohertj2 added 1 commit 2026-04-26 15:20:06 -04:00
Empty submission was producing a blank user_turn event in the log and
firing the LLM stream anyway — the bot would invent a response from the
kickoff context alone, producing a monologue with no user input. Two-
layer fix:

- Browser: add `required` to the prose textarea in chat.html so the
  form refuses to submit empty.
- Server: 400 in post_turn when prose.strip() is empty. Defense in
  depth — if a client bypasses the textarea attribute (custom UI,
  curl, etc.), the server still rejects.

Verified live: POST with empty body returns 400; POST with whitespace-
only returns 400; chat shell renders the textarea with required.
Full suite: 168 passed.
dohertj2 added 1 commit 2026-04-26 15:23:10 -04:00
The form-submit handler in chat.html was setting
``textarea.disabled = true`` synchronously before the browser actually
serialized the form. Disabled form fields are excluded from
submission, so the request body contained ``prose=""`` even when the
user had typed text — which the server (correctly) rejected with the
new empty-prose 400. Net effect: typing "hello" + Send gave a "prose
cannot be empty" error.

Switched to ``readOnly``: same UX (user can't edit while streaming)
but the field IS submitted. The unlock path now also clears the
textarea and refocuses for the next turn.
dohertj2 added 1 commit 2026-04-26 15:28:11 -04:00
Bot replies were running long (4 paragraphs of action+dialogue beats
per turn) because we never set max_tokens on the narrative call. Three
tunable knobs now in Settings (set in data/config.toml to override):

- narrative_max_tokens: int = 400
  Hard cap on each generated response. ~400 tokens ≈ 1–2 short
  paragraphs. Drop to 200 for terse banter, bump to 800+ for longer
  scenes.

- narrative_temperature: float = 0.85
  Sampling temperature. 0.7 = grounded/consistent (slightly stiff),
  0.85 = creative-but-in-character (default), 1.0 = wide variety,
  >1.0 = often off-the-rails.

- prompt closing instruction now nudges: "Keep your response to a
  single beat — one or two short paragraphs at most. Don't monologue;
  leave room for the other person to react."

Both turns.py (post_turn) and regenerate.py forward the params to
client.stream(). FeatherlessClient already passes **params through to
the OpenAI-compat endpoint.

Note: temperature doesn't control length — that was a common
misconception. max_tokens is the actual length cap. Lower temperature
makes word choice more predictable (slightly stiffer voice), not
shorter. Both knobs are useful for different goals.
dohertj2 added 1 commit 2026-04-26 15:37:11 -04:00
13 tasks across 6 waves (1, 2, 3, 4a, 4b, 5). Designed for parallel
subagent execution where file-disjointness allows.

Waves 1, 2, 4a, and 5 each contain 2-3 tasks that touch disjoint files
and can be dispatched concurrently via the Agent tool with
isolation: "worktree". Waves 3 (drawer guest support) and 4b (multi-
entity turn flow) are single-task because they touch hot files
(_drawer.html, turns.py) that cannot be safely co-modified.

Plan covers:
- T36: group_node schema + handlers (new migration 0008)
- T37: guest_added / guest_removed event handlers (modifies world.py)
- T38: relationship-seed service ("have they met?")
- T39: interjection classifier service
- T40: multi-entity state-update coordinator (6 directed pairs)
- T41: multi-witness memory write helper
- T42: drawer guest add/remove UI + render
- T43: multi-entity prompt assembly (extends T18)
- T44: multi-entity turn flow (rewrites post_turn)
- T45: multi-entity per-POV summaries on scene close
- T46: witness filter cross-coverage tests
- T47: bot_reset cascades to guest references
- T48: Phase 2 documentation update

Plan also documents:
- Worktree-per-subagent dispatch pattern using Agent isolation flag
- Merge ordering per wave (file-disjointness = conflict-free merges)
- Failure recovery (cancel failed parallel task, re-dispatch as solo)
- Conflict prevention checklist (verify Files sections disjoint per wave)

Tasks file (.tasks.json) carries dependency DAG with `blockedBy` and
`parallelGroup` so a future executing-plans run can dispatch correctly.

NOT EXECUTING. Plan only.
dohertj2 merged commit 3be7920f41 into main 2026-04-26 19:59:30 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dohertj2/chat#1