Files
chat/docs/plans/2026-04-26-v1-requirements-design.md
T

44 KiB
Raw Blame History

v1 Requirements & Behavioral Design

Date: 2026-04-26 Status: approved (brainstorming complete; pending writing-plans) Companion document: rp-engine-design.md — architectural design

This document captures the product- and behavior-level requirements for v1, derived from a structured brainstorm. The architectural design doc covers the how (event sourcing, schema sketch, prompt construction). This doc covers the what (user experience, scope cuts, and the contract between the user and the system).


1. Product Vision

A local-first, single-user roleplay engine for relational and companion-style RP. The goal is bots that feel persistent, consistent, and like they have inner lives — durable across long horizons, immune to the three classic failure modes (memory loss, quality decay, prop pollution).

The LLM is treated as a renderer for structured world state, not as the state-holder. State lives in an event-sourced SQLite database that survives any model outage and is replayable for free.

2. Scope

2.1 In scope (v1)

  • Single user, single Mac (always-on).
  • A library of bots the user has authored. Each bot is a persistent entity with its own identity, memory, edges, and clock.
  • One chat per bot. A second bot can be added as a guest into any chat. Hard cap: 2 bots in any scene.
  • Explicit / mature content allowed.
  • Featherless as the LLM backend over its OpenAI-compatible API. Two model slots:
    • narrative_modeldphn/Dolphin-Mistral-24B-Venice-Edition (uncensored, narrative-grade). 32K context.
    • classifier_modelNousResearch/Hermes-3-Llama-3.1-8B (uncensored, tuned for tool use / structured output). 128K context. Fallback chain if it underperforms on JSON: cognitivecomputations/dolphin-2.9.4-llama3-8bmlabonne/Meta-Llama-3.1-8B-Instruct-abliterated.
    • Classifier is used for: turn parsing (dialogue/action/ooc), kickoff prose parsing, scene-close detection, interjection decisions, significance scoring, state-update extraction, jump-skip memory synthesis.

2.2 Out of scope (v1)

  • Multi-user / multi-device.
  • More than 2 bots in a scene.
  • Proactive bot contact / out-of-session messages / push notifications.
  • Background world ticks ("meanwhile…" scenes that play while the user is idle) — deferred to Phase 2-3.
  • Branching timelines with UI (mechanically supported by event sourcing; not exposed yet).
  • Multimodality: no portraits, voice, or images. Defer indefinitely.

2.3 Non-functional constraints

  • All state survives any model or network outage.
  • Each turn writes one transactional event-log batch — no half-applied state.
  • Streaming required for narrative output.
  • One SQLite file is the entire data layer. No Postgres, Redis, Pinecone, Docker.

3. Architecture & Backend

[ Mac, always-on ]                              [ Featherless, stateless ]
  Web UI  ──► Orchestrator ──► LLM client ──►   narrative_model    (Dolphin-Mistral-24B-Venice)
                  │                              classifier_model  (small, TBD)
                  ├── Event log + projector
                  ├── SQLite (one file)
                  └── Retrieval + prompt builder

The orchestrator never knows which model is in use — only generate(prompt, params) -> text (streamed). The Featherless client is one implementation; mocks and other backends can drop in for tests or future migration.

API key handling: keys live in a local config file outside the repository. Never commit a key to the repo, paste in chat logs, or include in exports.

3.1 Runtime stack

  • Backend: Python 3.11+ with FastAPI as the HTTP server.
  • Frontend: server-rendered HTML + HTMX + minimal vanilla JS/CSS. No JS build chain.
  • Live updates: Server-Sent Events (SSE) per chat. Server keeps a per-chat in-process pub/sub channel (an asyncio.Queue per chat_id). Every browser tab on /chats/<id> opens an SSE connection to /chats/<id>/events. State changes (new turn, streamed tokens, drawer state, edge updates, scene close) publish to the channel; all subscribed tabs receive the event and HTMX swaps the relevant DOM region.
  • Multi-tab sync is a Phase 1 requirement, not a polish item. Two browser tabs open to the same chat must mirror each other in real time. Implications:
    • In-progress typing is tab-local until submit (no collaborative input in v1).
    • On reconnect/refresh, the server first sends a "current state" snapshot, then resumes streaming.
    • The same architecture trivially supports a phone or tablet on the LAN later — bind to 0.0.0.0 + add a shared-secret token if/when desired. Default is 127.0.0.1, no auth.

3.2 Token budgets and trimming tiers

Token accounting via tiktoken with the closest cl100k approximation. Mistral and Llama tokenizers diverge ~5%; we accept the drift.

  • Narrative prompt: 8K hard ceiling, 6K soft target. Leaves ~2-4K headroom for streamed output and avoids long-context performance cliffs. Plenty for our prompt shape.
  • Classifier prompt: 4K hard ceiling. Most calls are well under 1K.

When the assembled prompt exceeds the soft target, trim in this order — never trim must-include:

  • MUST-include (always present):
    • System message + speaker identity
    • Speaker's edge to the addressee
    • Activity snapshot for all present entities
    • Current scene description
    • Last 4 turns of dialogue
  • SHOULD-include (trim when over budget):
    • Other edges of the speaker (e.g. speaker → other present)
    • Group node summary (when applicable)
    • Active threads
    • Currently active events + props
  • NICE-include (trim first):
    • Retrieved memories beyond top-2 (drop K=4 to K=2)
    • Dialogue turns beyond the last 4 (replace older turns with a one-line summary)
    • Per-POV summary of the previous scene

3.3 Classifier failure handling

The classifier does ~5 different jobs (turn parse, scene-close detection, interjection, significance, state-update). Any can fail (malformed JSON, refusal, timeout). The play loop must never block on classifier errors.

  1. Constrained output first. Use Pydantic models for every classifier call, passed through instructor (or Featherless-native JSON-schema mode if available).
  2. One retry on parse failure with a stricter "respond with JSON only" reminder appended. Same prompt, same model.
  3. Schema-default fallback on second failure:
    • turn-parse default: treat the whole turn as one dialogue segment.
    • scene-close default: don't close.
    • interjection default: don't interject.
    • significance default: 1 (Notable). Conservative.
    • state-update default: no deltas, no facts; just bump last_interaction.
  4. Log every fallback to a classifier_failures table (event_id, kind, raw_text, attempt_count). Drawer dev panel surfaces a count.
  5. Timeout: 10s per classifier call. Beyond that, fall back. Narrative model has no orchestrator-imposed timeout (let it stream).
  6. Refusal detection: if the response starts with a refusal pattern ("I can't", "I'm sorry, but…") and isn't valid JSON, treat as a parse failure and retry with the JSON-only reminder. If it refuses again, swap to the next model in the fallback chain (dolphin-2.9.4-llama3-8b first), automatically, for that one call. Log it.

3.4 Edge update granularity

Edges drift smoothly turn-to-turn so retrieval and prompt assembly always see current values; long-form summary only churns at compression boundaries.

Per turn (cheap, small): the post-utterance state-update classifier on each present entity produces a delta:

  • affinity_delta (signed integer, ±03 on a 0-100 scale; conservative).
  • trust_delta (same shape).
  • knowledge_facts: [string] — new things this entity now knows about the target.
  • last_interaction is bumped to the current chat-id + chat-clock unconditionally.

Per scene close (expensive, summarizing):

  • Edge summary rewritten by the classifier from the per-POV summary of the closing scene plus the prior summary. Aggregates the meaningful arc.
  • "Shared private moments" list appended to (one entry per closed scene at significance ≥ 2).

Every edge mutation goes into the event log as an edge_update event so rollback works.

4. Data Model (top-level entities)

  • Bot — top-level persistent unit. Has identity (immutable per session), state (mood/goals/status), per-bot clock, kickoff spec.
  • Chat — exactly one per bot, with that bot as host. A chat carries an optional current guest bot, its own clock, and its own active scene. Chats do not own memories or identity — those are bot-owned.
  • You — singleton entity with a light identity (name, voice/persona summary).
  • Edges — per-pair, persistent across chats:
    • bot → you, you → bot for every authored bot
    • botA → botB, botB → botA initialized the first time two bots co-appear
    • Edges hold: affinity, trust, summary, knowledge known about target, last interaction (chat-id + chat-clock), shared private moments. Asymmetric — never collapsed into a single shared "relationship" field.
  • Memory — bot-owned. Each memory carries: a 3-bit [you, host, guest] witnessed-by mask, the chat it occurred in, that chat's clock at the time, source/reliability if secondhand, significance, embedding (Phase 4).
  • Events / threads — scoped to the chat where they exist. (Phase 3.)
  • Scenes — recorded per chat, with per-POV summaries written for each present witness on close.

4.1 Per-chat clocks

Each chat has its own chat_state.time, initialized from its host bot's kickoff scene and advanced only by explicit user time skips within that chat. Two clocks for two chats are independent — you may spend in-fiction days with BotA without BotB's clock advancing.

When BotB guests in BotA's chat, the scene runs on BotA's chat clock; memories written to both bots are timestamped at that clock value. BotB's chat clock is unchanged. Cross-chat time arithmetic is intentionally fuzzy — bots can reference cross-chat events ("when I came over that night") but the system does not claim precise "X days ago" math across chats. Within a single chat, time math is precise.

4.2 Bot library

Bots are top-level. Bot count in the system is unbounded. Per-scene cap is 2 bots (you + 1 host + optional 1 guest, or you + 1 host with no guest).

4.3 Chat clock format (storage vs display)

  • Stored as a precise UTC datetime (ISO 8601). Initialized at kickoff to a sensible default (the kickoff time mentioned in prose, or a "now in-fiction" fallback).
  • Displayed in the chat header as a friendly relative format: "Tuesday evening, 9:14pm" or "Day 3 — late afternoon". Helps preserve fictional feel. Computed from the stored datetime + the chat's narrative anchor (the in-fiction date corresponding to "Day 1").
  • The drawer shows the precise stored datetime alongside the friendly form.
  • Time skips advance the stored datetime; the friendly format re-renders.
  • This split lets the system do precise time arithmetic (last-interaction calculations, Phase 3 event triggers) while showing humane time to the user.

5. Authoring

Authoring is structured, done in a form-based UI. A bot is created once, then edited through reset (full wipe) or by amending immutable identity fields directly.

5.1 Authored fields per bot

Identity (immutable per session):

  • Name (required).
  • Persona paragraph — free-form prose; goes into must-include identity block.
  • Voice samples — 13 short prose passages in the bot's voice. Stored joined with --- separators. Injected into the speaker prompt as a must-include block: "Voice reference. Match this register, vocabulary, and rhythm: {samples}". Always sent in full in v1 (rotation strategy is Phase 2 fallback if budget pressure forces it).
  • Trait list — free-form list of comma- or newline-separated phrases ("introverted, quick to anger, terrible at small talk, loves cats"). 315 items typical, no hard cap. Stored as traits: list[str]. Rendered in prompt as Traits: …. Not Big Five / MBTI — free-form anchors voice without biasing the model.
  • Backstory — free-form prose, single string. Soft target 100500 words; no hard cap, but trim tier downgrades long backstories under budget pressure.

Per-bot relationship & first encounter:

  • Initial relationship to you: free-form prose ("BotA is my coworker; we've worked together for two years; she has a crush on me she hasn't admitted"). Parsed once into seeded you ↔ bot edge content on first run.
  • Kickoff scene: free-form prose describing the first encounter ("you stay late at the office; only you and BotA are there; she's at her desk pretending to work"). Parsed on first init into structured container, activity, seed edge content, and initial scene state. The user confirms or edits the parsed result before play begins.

5.2 First co-appearance: "have they met?"

The first time two bots appear in the same scene, the orchestrator prompts: Have BotA and BotB met before?

  • Yes → user writes a short prose seed describing how they know each other. Parsed into initial botA ↔ botB edge content.
  • No → edges initialize empty. The bots actually meet on-screen.

5.3 Reset

Reset is a per-bot action with hard confirmation (type the bot's name).

  • Wipes: chat history, memories, scenes, edges involving the bot, current scene state.
  • Preserves: identity, initial edge seed, kickoff scene.
  • After reset, the chat sits ready — kickoff does not auto-play. The next user message triggers kickoff.

5.4 "You" entity authoring (one-time, shared across all chats)

  • Name (required).
  • Pronouns (optional, freeform string; if empty, model infers).
  • Persona blob (optional, recommended) — short paragraph for context ("30, software engineer, lives alone, dry sense of humor"). Provides backdrop the bots can use without overriding per-bot framing.

No voice samples for "you" — the user provides their own dialogue, the bot doesn't need to mimic the user's voice. Per-bot specifics about how each bot knows you stay in the per-bot "initial relationship to you" field (§5.1).

6. Play Loop

6.1 Input convention (mixed prose, novel-style)

A turn is free-form prose with conventional markers:

  • *walks over* — action.
  • Quoted or bare text — dialogue.
  • ((double parens)) — out-of-character commentary or meta-instruction. Flagged but not sent to the bot. (Default; stored as a config field; the user may change it before play begins.)

A small classifier call splits the turn into segments tagged dialogue | action | ooc. Action segments update the user's activity record.

6.2 Turn-taking (scene with you + host, optional guest)

  • Addressee gets the floor. With one bot present, the bot replies. With two bots, the addressee bot replies (inferred from prose: name mention, gaze, context).
  • Interjection allowance (only when guest is present): a classifier call decides whether the non-addressee bot interjects this turn. If yes, it produces a short reaction beat after the addressee's reply. If no, it silently witnesses.
  • State-update pass on every present entity after every utterance, not just the speaker. Silent witnesses still update edges (BotB watching BotA say something cruel updates botB → botA).

6.3 Speaker prompt assembly

For the speaking bot, the prompt is assembled from their own state and their own witnessed memories — never from a global view:

  1. Speaker identity + current state (mood, goals).
  2. Speaker → you edge (and speaker → other-bot edge, if guest is present).
  3. Group node (Phase 2+, only if all 3 present).
  4. Chat-state snapshot: time, weather, location.
  5. Active scene description.
  6. Activity snapshot for all present entities (always a small structured block — anchors spatial grounding).
  7. Active threads (Phase 3).
  8. Recent dialogue window.
  9. Retrieved memories (top-K, witness-filtered, speaker-owned).
  10. Currently active events + props (Phase 3).

6.4 Drawer (state visibility)

A collapsible right-side drawer. Closed by default. When open, shows for the current chat:

  • Current scene + container.
  • Activity record per present entity.
  • Edges (host ↔ you, host ↔ guest if any).
  • Recent witnessed memories from the host's POV (with significance markers · ★★).
  • Pinned memories in their own section with a n / 8 counter and an unpin affordance per row.
  • Active threads and currently active events (Phase 3).
  • Snapshots panel (Phase 4 surface; data exists from Phase 1).

v1 editable fields (highest-value rescues for LLM errors):

  • Activity record: action.verb, attention, posture (text fields). Fixes "leaning on the kitchen counter while in a car".
  • Edges: affinity, trust (sliders 0100), summary (textarea), knowledge_facts (add/remove list). Fixes misread moments.
  • Memory: pov_summary (textarea), significance (dropdown 03), pin toggle. Fixes wrong summaries.

Read-only in v1 (Phase 4 makes editable):

  • Container properties.
  • Identity fields (immutable per session by design — change via reset).
  • Witness flags (rewriting these silently changes continuity logic).
  • Structural fields (owner_id, scene_id, chat_id).

Every drawer edit goes through the event log as a manual_edit event capturing the prior value, so it is fully reversible.

6.5 Activity record specifics

The activity record per entity carries:

  • current_action.verbopen string ("driving", "writing an email", "lying in bed reading"). No enum.
  • current_action.interruptible — bool. Classifier-extracted alongside the verb on each activity update.
  • current_action.required_attentionlow | medium | high. Classifier-extracted.
  • current_action.expected_duration — text estimate ("a few minutes", "an hour", "ongoing"). Classifier-extracted.
  • current_action.started_at — chat-clock snapshot when the action begins.
  • posturestanding | sitting | lying | … (open string; classifier-extracted).
  • position{container_id, slot_name}.
  • holding — list of items (open strings).
  • attentionoptional free-form string ("the road", "her phone", "you", "the document"). Set only when prose makes attention explicit; left empty otherwise. When empty, prompt assembly omits the field rather than rendering "attention: unknown".
  • status{conscious, sober, injured, …} (open dict).

The activity block in the prompt renders verb + structured properties + posture + attention (if set) so the narrative model has both the natural-language verb and the structured constraints to dramatize ("BotA is driving — high attention, not interruptible, attention on the road").

7. Scene Lifecycle

7.1 Starts

  • First-ever scene with a bot → kickoff plays after the kickoff prose is parsed and the user confirms the structured form.
  • Returning to an existing chat → resume in-place. Same container, same activity, same active scene. No auto time advancement.
  • Adding a guest bot → guest just appears in the current scene. The user narrates any in-fiction justification in prose. botA ↔ botB edges initialize per the per-pair "have they met?" answer if it's their first co-appearance.

7.2 Closes (hybrid auto + manual)

A scene closes when one of the following fires:

  • Auto, hard signals:
    • Container change (parsed from prose — "we drove to the park", "we stepped outside").
    • Declared time skip (Phase 3).
    • Explicit user pattern ("we're done here", "fade out", etc.) recognized by classifier.
  • Manual: "Close scene" button in the drawer for soft transitions the user wants bookmarked.

False positives are the bigger risk than false negatives, so the auto-detector errs conservative; manual close is always available.

7.3 On close

  1. Significance classifier pass on the closing scene.
  2. Per-POV summaries written — one per witness, from their angle. No omniscient narration.
  3. Edge updates applied (affinity, trust, summary, knowledge, last-interaction).
  4. Closed events finalize; their props remain in the closed event record (do not promote to memory).
  5. Raw dialogue archived to cold storage.
  6. New active scene is opened (resume or fresh).

7.4 Container authoring (parse-and-extend)

  • Kickoff parse creates the initial container for a bot's chat from the kickoff prose ("you stay late at the office" → container office, type workplace, public, slots auto-named from context).
  • Transitions during play ("we drove to the park", "let me grab a drink from the kitchen") are detected by the scene-close classifier; if a container with that name doesn't exist yet in this chat, it's created on the fly with classifier-inferred defaults; if it does, it's reused.
  • Drawer shows the current container with all fields editable (Phase 4) or read-only with manual creation (v1). New containers can be pre-authored in the drawer if you want.
  • Schema: containers(id, chat_id, name, type, properties_json, parent_id) where properties_json holds {moving: bool, public: bool, audible_range: text, slots: [{name, occupant_id?}]}.
  • Containers are scoped per chat. BotA's "apartment" and BotC's "apartment" are distinct records — no name collision.

7.5 Guest leaves mid-scene

Adding a guest then having them leave is a normal flow. What happens:

  • The user removes the guest via the drawer ("Remove BotB from scene"), or BotB exits in prose ("BotB grabs her coat and heads out") — the classifier detects exit on the next turn and prompts for confirmation.
  • The current scene closes at the moment of guest exit (this counts as a hard close signal). Per-POV summaries are written for all three witnesses including BotB. Edges update.
  • A new scene immediately opens with just you + host bot, in the same container, with carry-over activity. No need to re-narrate "now we're alone".
  • Witness flags from the closed scene stay [1, host, guest] for the period BotB was present — that data is permanent and travels with the memory.
  • The group node persists for the chat (its content is updated on close, not deleted), available for future co-presences in this chat. If BotB never returns, the group node just sits unused.
  • BotB's chat clock is unchanged — it remains wherever it was when last visited (per §4.1).
  • BotB's memories of the scene are written to BotB's memory store, available next time you talk to BotB alone.

8. Memory Retrieval

8.1 Always-loaded (no retrieval cost)

  • Pinned memories (user pins from the drawer).
  • Current scene's running dialogue window.
  • Active threads (Phase 3).
  • Last N scenes' per-POV summaries from the current chat (N tunable; default 3).

8.2 Retrieved (top-K)

  • Phase 1: SQLite FTS5 over memories.pov_summary for the speaker. Filter WHERE owner_id = speaker AND witness_bit_for_speaker = 1 (hard SQL constraint, not a soft signal).
  • Phase 4: vector search via sqlite-vss/sqlite-vec, same filter.
  • Default K = 4. Recency boost + significance boost in ranking.

8.3 Witness rule (non-negotiable)

A bot cannot retrieve memories whose witness bit for them is 0. Period. This is the mechanism preventing bots from referencing things they couldn't possibly know.

8.4 Cross-chat memory

A bot's memory store contains memories from any chat the bot has been in (host or guest). All are retrievable. Bots may reference cross-chat events naturally; precise cross-chat time arithmetic is not attempted.

8.5 Pinning

  • User pins via the drawer (pin icon next to a memory row).
  • Pinned memories are always-loaded for the speaker (no retrieval cost).
  • Soft cap: 8 pins per bot.
  • Pivotal-significance (score 3) memories are auto-pinned for the witness whose POV they're in.
  • When over cap, the oldest auto-pin is unpinned (not deleted — just removed from the pinned set; still findable via retrieval). Manual pins are never auto-evicted — user must unpin manually.
  • Drawer shows pinned memories in their own section with a n / 8 counter and an unpin affordance per row.

9. Time, Skips, Events (Phase 3 surface)

  • Each chat has its own clock; advances only on explicit user skip commands within that chat.
  • Elision skip — "skip to when we arrive". Resolves to end-of-current-action; activity completes; landing state set; brief transition narration generated.
  • Jump skip — "next morning", "a week later". User is prompted: "anything notable happen?" Answer becomes synthesized memory(ies) for the speaker bot via classifier call. Chat clock advances. Activity is reset to a coherent landing state.
  • Events — lifecycles (planned | active | completed | cancelled | expired) with their own scoped props. Promotion rules on close (only the four categories: object acquired → inventory, knowledge gained → edge knowledge, relationship change → edge summary, everything else stays in closed record).

Phase 1 has no skips and no events. Time is set at kickoff and stays put unless the bot is reset.

10. Rollback, Regenerate, Reset

10.1 Rewind

  • Button on every turn. Truncates event log past that turn; rebuilds projection.

  • Pre-rewind snapshot is always taken automatically before truncating. Stored in data/snapshots/rewind/. Retained 14 days then pruned.

  • Confirmation modal shows an impact preview:

    Rewind to turn 47?
    
    This will remove:
      • 12 messages (turns 4859)
      • 1 scene transition (drive to park)
      • 2 edge updates (BotA → You, Group)
      • 3 memories from BotA's store
      • 1 fired event (arrived at park)
      • 1 manual edit (BotA affinity)
    
    A pre-rewind snapshot will be saved automatically.
    
    [ Cancel ]   [ Rewind ]
    
  • After successful rewind, a 30-second toast appears: "Rewound 12 turns. [Undo]" — clicking restores from the pre-rewind snapshot. After the toast dismisses, the snapshot is still on disk and reachable from the drawer's snapshots panel (Phase 4 UI; data is there from Phase 1).

  • "Rewind, keep current as branch" is Phase 4.

10.2 Regenerate (inline, not modal)

Clicking Regenerate on the latest bot turn:

  1. Scrolls to your last user turn and puts it into an inline edit state (textarea pre-filled with your prose).
  2. The bot's response below shows a faded "regenerating…" placeholder.
  3. Submit button is Regenerate (not Send). Hitting it:
    • If you edited: appends a user_turn_edit event capturing the new prose, then a new assistant_turn event with the new generated response.
    • If you didn't edit: appends only a new assistant_turn event.
  4. The previous assistant_turn event is superseded, not deleted — kept in the log with a superseded_by pointer so it's recoverable. Display hides it.
  5. Downstream classifier passes (state-update, significance) re-run on the new response.
  6. [ Cancel ] reverts to the original (no event written).

10.3 Reset bot

Full wipe with hard confirm (type bot name). Behavior detailed in §5.3.

10.4 Snapshot frequency & retention

  • Periodic snapshot: every 100 events OR every 30 minutes of activity, whichever first. Stored in data/snapshots/periodic/.
  • Retention: keep last 5 periodic snapshots. Pre-rewind snapshots retained 14 days.
  • Cold-load behavior: replay starts from the most recent periodic snapshot, then applies events forward. Bounds replay cost on app start.

10.5 Deferred to Phase 4

Branching with UI, hide-from-view soft delete, surgical delete + cascade with impact preview. Mechanically supported by event sourcing already; no v1 UI surface.

11. Compression & Promotion

11.1 Significance rubric

Classifier call after each turn (queued, async) tags the turn 03. Scene significance is the max turn-significance within the scene.

Score Name Definition Examples
0 Routine Banter, small talk, ordinary action. Forgettable on its own; aggregates only via edge stats. "Hi, how was your day?" / "Fine, you?"
1 Notable A specific detail, opinion, or beat worth remembering but not arc-changing. Default for non-trivial dialogue. BotA mentions a band she likes; you discover BotB hates a food.
2 Significant A scene-level moment — meaningful confession, real disagreement, a date, a confided secret. First date; BotA tells you about her sister; an argument.
3 Pivotal A relationship-altering event. Updates edge summary and (often) affinity substantially. Always auto-pinned. First kiss; betrayal; "I love you"; learning a defining secret.

Where each level is used:

  • Retrieval ranking: significance multiplier applied as score × constant to FTS / vector rank.
  • Compression: scenes with max-turn-significance ≥ 2 retain key quotes; ≤ 1 collapse fully into the per-POV summary.
  • Exports: scene-close JSON written to data/exports/ when scene significance ≥ 2.
  • Auto-pin: turns scored 3 are auto-pinned for each witness whose POV they're in.
  • UI hint: drawer renders score as ·, , , ★★.

Tie-breakers: turn significance is the max across emotional, factual, and relational facets; the classifier returns the max. Conservative bias — when uncertain, score lower.

11.2 Per-POV summaries

Written per witness when a scene closes. Different details, different interpretations. No omniscient narration.

11.3 Promotion rules ("picnic basket" rule)

  • Object acquired → entity inventory.
  • Knowledge gained → relevant edge's knowledge.
  • Relationship change → edge summary.
  • Everything else stays in the closed event/scene record. Surfaces only on explicit recall.

11.4 Compression tiers

  • Last scene: full dialogue retained.
  • Recent scenes: per-POV summary + key quotes (only if significance ≥ 2; else summary only).
  • Older scenes: per-POV summary only.
  • Distant past: rolled into edge summaries.

12. Persistence & Ops (v1 defaults)

  • SQLite WAL mode, foreign keys on, transactional turns.
  • Project-folder layout (DB lives inside the repo, gitignored):
    • DB: <repo>/data/chat.db
    • Backups: <repo>/data/backups/ (timestamped copies)
    • Pre-rewind snapshots: <repo>/data/snapshots/
    • Significant-scene JSON exports: <repo>/data/exports/
    • Config: <repo>/data/config.toml (holds Featherless API key, model names, OOC marker, K, budget, etc. Gitignored.)
    • The entire data/ tree is in .gitignore so secrets and state never get committed.
    • CHAT_DB_PATH env var honored as an override if you want to point at a different file (e.g., a backup or a sibling repo's data).
  • Auto-backup nightly via launchd. Timestamped copies. Last 14 retained. Pre-rewind snapshots are separate and not pruned.
  • Significant-scene JSON exports written to data/exports/ when scenes close at significance ≥ 2.
  • Schema versioned in a meta table; migrations applied on startup.

13. Phase Cut

Phase 1 (v1) — must build end-to-end before any Phase 2 work

  • Featherless client (OpenAI-compatible) with narrative_model and classifier_model configured.
  • Schema, event log, projector, replay.
  • Bot authoring UI (form-based) including kickoff prose + parse-and-confirm.
  • Single-bot chat (host only, no guest yet):
    • Mixed-prose input with ((parens)) OOC marker.
    • Addressee = host. No interjection logic yet.
    • Narrative streaming.
    • Post-turn state-update pass.
  • Drawer: read-only first; edit-on-demand may land in Phase 1.5.
  • Scene close (hard-signal auto + manual button) with per-POV summary for the host bot only.
  • Memory: witness flag stored, FTS5 retrieval (K=4), recency + significance boost.
  • Rewind + regenerate (with edit-then-regenerate).
  • Reset bot (hard confirm).
  • Per-chat clock (set at kickoff; no skips yet).
  • Nightly backups + pre-rewind snapshots.

Phase 2 — multi-entity

Status: shipped 2026-04-26 — multi-entity scene support, guest add/remove drawer UX, guest-aware prompt assembly, multi-entity turn flow with interjection classifier, per-POV scene close summaries for every present witness, group_node initialization/update, and bot reset cascade clearing stale chats.guest_bot_id references all landed across the wave5 task series (see CLAUDE.md § "Phase 2 status" for the deliverable summary and follow-ups).

  • Guest bot in chat (3-entity scene config).
  • Interjection classifier call.
  • Witness filtering across multiple owners.
  • Group node (when all three present).
  • Per-pair "have they met?" prompt.
  • botA ↔ botB edges.

Phase 3 — events, skips, threads

Status: shipped 2026-04-26 (T49T67, 19 tasks across 8 waves; schema baseline now version 11; +68 tests). See "Phase 3 status" in CLAUDE.md for the per-task breakdown.

  • Events with lifecycles and scoped props.
  • Time skips: elision and jump.
  • Active threads.
  • Significance classifier improvements.
  • "Meanwhile…" scenes (scene config 4) — autonomous.

Phase 4 — polish

Status: shipped 2026-04-27 (T88T102, 15 tasks across 8 waves; +70 tests). See "Phase 4 status" in CLAUDE.md for the per-task breakdown. Vector retrieval shipped via pure-Python cosine over a JSON-blob embeddings table (sqlite-vec deferred — host Python lacks loadable extensions); branching is data-model + drawer UI; significance review, hide-from-view soft delete, surgical delete with cascade preview, snapshot UX, and cross-chat search all surface from the drawer or top-bar.

  • Vector retrieval (sqlite-vss or sqlite-vec).
  • Branching UI.
  • Drawer-edit on every field.
  • Backup tooling improvements.
  • Significance review UI.
  • Surgical delete + cascade with impact preview.
  • Hide-from-view soft delete.

14. Open / Deferred Decisions

All round-1 and round-2 brainstorm decisions are resolved and folded into §3–§12 / §16. Honest list of what is still deferred after this brainstorm:

  • Embedding model (Phase 4 — pick whatever's cheap and good enough on Featherless or local at the time).
  • sqlite-vss vs sqlite-vec (Phase 4 — pick based on each project's state at the time).
  • Exact prompt templates for narrative + each classifier job — drafted against real prompts during Phase 1.
  • Schema column-level types and FK details — finalized during Phase 1 schema implementation.
  • Token-counting accuracy — accept ~5% drift from tiktoken cl100k vs actual Mistral / Llama tokenizers; revisit if drift causes real budget overruns.
  • "Snapshots" panel UX in the drawer — Phase 4. Data is written from Phase 1, read-back UI lands later.
  • In-fiction time auto-advance during a scene — Phase 1 freezes the chat clock between user-initiated skips. If this feels stale during Phase 1 play, revisit with model-narrated parsing in Phase 3 (§9).
  • Search across chat history — out of scope for v1. Phase 4 if needed.
  • Avatars / portraits — out of scope (multimodality is deferred indefinitely).
  • Performance targets — measured against real prompts in Phase 1; no preset SLOs.

15. Non-Negotiables (rules every implementer must respect)

  1. State changes go through the event log. Never UPDATE a state row directly; append an event, let the projector apply it.
  2. Witness filter every memory read. A bot must not retrieve memories whose witness bit for them is 0.
  3. Edges are directed. botA → botB and botB → botA are independent.
  4. Don't promote event props to memory. Only the four promotion categories (object, knowledge, relationship change, narrative gist via summary).
  5. Per-POV, not omniscient. Scene summaries are written per witness, from their angle.
  6. Activity block is always in the prompt. Spatial grounding prevents "leaning on the kitchen counter while in a car" failures.
  7. Streaming on the inference path; non-blocking bookkeeping runs while the LLM streams.
  8. No extra services. SQLite + a process. Push back on suggestions to add infrastructure.

16. UI Shape & Flow

16.1 Top-level navigation

Persistent left rail with three sections:

  • Chats (default). List of every authored bot, each row showing: bot name, last message snippet, last-played-at (real time), unread/idle indicator, current chat clock (in-fiction time). Click → opens that chat. Single active chat per browser tab.
  • Bots. Library view. List of authored bots with thumbnails (initials/avatar later), edit / reset / delete actions per bot. "New bot" button at the top.
  • Settings. Single screen: Featherless API key, model overrides, OOC marker, K, token budgets, "you" entity authoring, theme.

Top-of-rail: "+" button creates a new bot (jumps to authoring); after author + kickoff, you land in the new chat.

16.2 First-run experience

On fresh install (empty DB):

  1. App boots → "Set up your profile" screen — fills in the "you" entity (name, pronouns, persona blob).
  2. After save → "Add your first bot" — bot authoring form with inline guides.
  3. After save → kickoff parse-and-confirm — orchestrator parses kickoff prose, displays the parsed structured form (container, activity, seed edges) with edit affordances; user confirms.
  4. Lands in the chat, ready for first turn.

Step 1 is skip-able (revisit in Settings); step 2 is required (no chats without a bot).

16.3 Display formatting

  • Bot output rendered as lightweight markdown — paragraphs, italics, bold, blockquotes. No headings or code blocks (foreign to RP prose).
  • Action segments (*walks over*) rendered as italics; dialogue rendered plain.
  • User input uses the same render rules. The textarea is plain — no live preview.
  • OOC ((parens)): shown in the transcript with italic + dimmed + smaller font, set off from surrounding prose. Always visible to you (you should see your own meta-commentary). Stripped from the prompt the bot sees.
  • Speaker labels: bot turns prefixed with the bot's name in bold; your turns prefixed with your "you" name. Tight spacing.
  • Stream rendering: tokens append as they arrive; markdown re-rendered on each chunk. Cursor indicator at the trailing edge while streaming.

16.4 Streaming UX

  • Typing indicator: the bot's row appears immediately with name + "…" pulse, then tokens fill.
  • Stop button on the streaming bot row halts mid-stream. On stop:
    • Partial response is committed as a normal assistant_turn event with truncated: true.
    • Downstream classifier passes still run on the partial text (state-update, significance) — partial is fine, the bot really did "say up to here".
    • You can immediately type your next turn or click Regenerate to redo the whole response.
  • Send-while-streaming: disabled. Input box is locked during stream. Stop first.
  • Mid-stream disconnect (Featherless drops, network blip, page refresh): treated as an interrupt with truncated: true, partial text committed if any was received. UI surfaces "connection lost — partial response saved" banner with a Regenerate button.
  • Multi-tab streaming: tokens stream to all subscribed tabs simultaneously via the per-chat SSE channel (§3.1). Stop from any tab interrupts for everyone.

16.5 Error UX surface

  • Featherless API down / unauthorized / out of credits: failed turn shows an error banner inline ("Featherless: 401 unauthorized" / "rate limited" / "service unavailable") with a Retry button. No event committed on hard failure — the turn never happened.
  • Featherless slow (> 30s before first token): warning banner — "Slow response — still waiting…" with the same Stop button. No automatic abort.
  • Classifier failure (after fallback ran): silent — fallback values used (per §3.3). Logged to classifier_failures table. Drawer dev panel surfaces a count. No user-facing notification in normal play.
  • DB write failure (rare; disk full, file lock): hard error modal — "Couldn't save your turn — fix and retry". Orchestrator does not advance state.
  • Schema-migration failure on startup: app refuses to launch, prints the error and the path to the DB. No automatic repair.
  • Missing config / first-run with empty config: redirected to Settings before first chat opens.

Appendix A — Decisions Log (this brainstorm)

# Decision Choice
1 Primary experience Relational + companion RP
2 Content scope Explicit content allowed
3 Backend Featherless (OpenAI-compatible API), Dolphin-Mistral-24B-Venice for narrative
4 Authoring Rich structured fields
5 Input style Mixed prose, novel-style
6 Multi-bot turn-taking Addressee + interjection allowance
7 Bot autonomy Strictly turn-based for v1; "meanwhile…" deferred to P2-3
8 Sessions/campaigns Bots are persistent units; one chat per bot; guests addable; bot-owned memory
9 First co-appearance Per-pair "have they met?" prompt
10 Scene starts Kickoff for new; resume in-place for returning; just-appear for guests
11 State visibility Optional drawer, closed by default, read-only with edit-on-demand
12 Scene close Hybrid (hard-signal auto + manual button)
13 Time Per-chat clock, advanced only by explicit user skips
14 Model strategy Small classifier model + large narrative model
15 Reset Full wipe + hard confirm; chat sits ready for kickoff
16 Rollback Rewind + regenerate (with edit-then-regenerate)
17 UI framework FastAPI + HTMX + SSE; multi-tab sync as a Phase 1 requirement
18 Classifier model NousResearch/Hermes-3-Llama-3.1-8B (fallbacks: dolphin-2.9.4-llama3-8b, Meta-Llama-3.1-8B-Instruct-abliterated)
19 Token budgets Narrative 8K hard / 6K soft; classifier 4K hard. Must/Should/Nice tiers per §3.2
20 OOC marker ((double parens)), configurable
21 DB location Project-folder <repo>/data/ tree (DB, backups, snapshots, exports, config). Gitignored. CHAT_DB_PATH env var honored
22 Significance rubric 0=Routine, 1=Notable, 2=Significant, 3=Pivotal. Uses across retrieval, compression, exports, auto-pin, drawer. Score-3 turns auto-pinned per witness
23 Edge update granularity Per-turn deltas (affinity, trust, knowledge_facts, last_interaction); per-scene-close summary rewrite
24 Classifier failure handling Pydantic-constrained → 1 retry → schema-default fallback. Refusal triggers fallback model swap for that call. 10s timeout. Failures logged
25 "You" entity Name (req) + pronouns (opt) + persona blob (opt). Per-bot relationship handles bot-specific framing
26 Voice samples 1-3 samples per bot; always must-include in v1
27 Trait list Free-form list of phrases (3-15 typical); not Big Five / MBTI
28 Backstory Free-form prose, 100-500 words target
29 Pinning Soft cap 8 / bot. Score-3 auto-pins. Manual pins never auto-evicted. Drawer surface with n/8 counter
30 Containers Parse-and-extend: kickoff parse seeds initial container; transitions create new; per-chat scoped
31 Activity verbs Open string for verb; classifier extracts interruptible, required_attention, expected_duration alongside
32 Attention Optional free-form string; omitted from prompt when empty
33 Drawer edit cut (v1) Editable: activity (action/attention/posture), edges (affinity/trust/summary/knowledge_facts), memory (pov_summary/significance/pin). Read-only: container, identity, witness, structural
34 Snapshots Periodic every 100 events / 30 min; pre-rewind always; 5 periodic retained, pre-rewind 14 days
35 Rewind UX Modal with structured impact preview; pre-rewind snapshot auto-saved; 30s undo toast
36 Regenerate UX Inline (no modal). Edit-then-regenerate. Old assistant_turn superseded, not deleted
37 Top-level nav Three-section left rail: Chats / Bots / Settings
38 First-run You-profile → first-bot author → kickoff parse-and-confirm → chat
39 Display formatting Lightweight markdown; *action* italic; OOC dimmed/italic/smaller; speaker labels bold
40 Chat clock format Stored ISO 8601 UTC datetime; displayed friendly relative ("Tuesday evening, 9:14pm")
41 Streaming UX Typing indicator, Stop button, Send disabled mid-stream, multi-tab sync via SSE, mid-stream disconnect = truncated commit
42 Error UX Featherless errors inline w/ Retry; classifier fails silent w/ fallback; DB write fails modal-blocking; schema migration fails launch-blocking
43 Guest leaves Auto scene-close on guest exit; per-POV summaries for all 3 incl. guest; new scene immediately opens with you+host; group node persists for chat