docs: flesh out behavioral specs in v1 requirements (round 2)

Resolves the deferred operational and UX semantics from round 2 of the brainstorm. Decisions 22–43 in the appendix decisions log. New / expanded sections: - §3.3 Classifier failure handling (Pydantic-constrained + retry + schema- default fallback, 10s timeout, refusal triggers fallback-model swap). - §3.4 Edge update granularity (per-turn deltas + per-scene-close summary rewrite; all mutations go through edge_update events). - §4.3 Chat clock format (stored ISO 8601 UTC; displayed friendly relative). - §5.1 Authoring expanded (voice samples format, trait list as free-form phrases, backstory length target). - §5.4 "You" entity authoring (one-time, shared). - §6.4 Drawer expanded (v1 editable fields cut: activity, edges, memory; read-only: container, identity, witness, structural; manual_edit events). - §6.5 Activity record specifics (open verb + classifier-extracted props). - §7.4 Container authoring (parse-and-extend, per-chat scoped). - §7.5 Guest leaves mid-scene (auto close + new scene with you+host). - §8.5 Pinning (soft cap 8, score-3 auto-pin, manual pins never auto-evict). - §10 Rollback expanded with full impact-preview modal, snapshot frequency (100 events / 30 min periodic, pre-rewind always), inline regenerate UX with edit-then-regenerate. - §11.1 Significance rubric (0=Routine, 1=Notable, 2=Significant, 3=Pivotal) with usage and tie-breakers. - §16 UI Shape & Flow (top-level nav, first-run experience, display formatting, streaming UX, error UX). CLAUDE.md adds a "Behavioral defaults (round 2)" section flagging the load-bearing rules for future Claude sessions. §14 Open / Deferred Decisions trimmed to the genuinely-still-open list (embedding model, vector index choice, prompt templates, search, etc.).
2026-04-26 11:11:46 -04:00
parent 5869f1c5ce
commit 8a6b48be11
2 changed files with 308 additions and 30 deletions
@@ -39,6 +39,18 @@ The 3-entity cap is load-bearing: it makes the relationship graph fully enumerab
 - **Data layout**: everything under `<repo>/data/` — `chat.db`, `backups/`, `snapshots/`, `exports/`, `config.toml`. The whole tree is `.gitignore`d. `CHAT_DB_PATH` env var honored as override.
 - **Auth**: bind to `127.0.0.1` only in v1. No auth.

+## Behavioral defaults (locked in v1 brainstorm round 2)
+
+- **Significance scale**: 0=Routine, 1=Notable, 2=Significant, 3=Pivotal. Score-3 turns auto-pin per witness. Drives retrieval ranking, compression, JSON exports.
+- **Edge updates**: per-turn deltas (`affinity_delta`, `trust_delta`, `knowledge_facts`, `last_interaction`); per-scene-close summary rewrite. Every mutation goes through the event log as `edge_update`.
+- **Classifier failure handling**: Pydantic-constrained → 1 retry with stricter reminder → schema-default fallback. 10s timeout. Never block the play loop. Refusals trigger fallback-model swap for that one call. Failures logged to `classifier_failures` table.
+- **Activity verbs**: open string + classifier-extracted `interruptible`, `required_attention`, `expected_duration`. Attention is optional free-form; omit from prompt when empty.
+- **Containers**: parse-and-extend. Per-chat scoped. Kickoff parse seeds initial; transitions create new.
+- **Pinning**: soft cap 8 / bot. Pivotal (score 3) = auto-pin. Manual pins never auto-evicted.
+- **Snapshots**: periodic every 100 events / 30 min; pre-rewind always. 5 periodic retained; pre-rewind retained 14 days.
+- **Streaming**: Stop button on streaming row; mid-stream disconnect commits partial with `truncated: true`; Send disabled mid-stream; multi-tab streaming via per-chat SSE channel.
+- **Display**: lightweight markdown; `*action*` italic; OOC `((parens))` shown dimmed/italic, never sent to bot.
+
 ## Core concepts (vocabulary)

 - **Entity**: `you | botA | botB`. Has identity (immutable), state (mood/goals/status), activity, per-POV memory.
@@ -93,6 +93,40 @@ When the assembled prompt exceeds the soft target, trim in this order — never
  - Dialogue turns beyond the last 4 (replace older turns with a one-line summary)
  - Per-POV summary of the previous scene

+### 3.3 Classifier failure handling
+
+The classifier does ~5 different jobs (turn parse, scene-close detection, interjection, significance, state-update). Any can fail (malformed JSON, refusal, timeout). The play loop must never block on classifier errors.
+
+1. **Constrained output first.** Use Pydantic models for every classifier call, passed through `instructor` (or Featherless-native JSON-schema mode if available).
+2. **One retry on parse failure** with a stricter "respond with JSON only" reminder appended. Same prompt, same model.
+3. **Schema-default fallback** on second failure:
+   - turn-parse default: treat the whole turn as one `dialogue` segment.
+   - scene-close default: don't close.
+   - interjection default: don't interject.
+   - significance default: 1 (Notable). Conservative.
+   - state-update default: no deltas, no facts; just bump `last_interaction`.
+4. **Log every fallback** to a `classifier_failures` table (`event_id, kind, raw_text, attempt_count`). Drawer dev panel surfaces a count.
+5. **Timeout**: 10s per classifier call. Beyond that, fall back. Narrative model has no orchestrator-imposed timeout (let it stream).
+6. **Refusal detection**: if the response starts with a refusal pattern ("I can't", "I'm sorry, but…") and isn't valid JSON, treat as a parse failure and retry with the JSON-only reminder. If it refuses again, swap to the next model in the fallback chain (`dolphin-2.9.4-llama3-8b` first), automatically, for that one call. Log it.
+
+### 3.4 Edge update granularity
+
+Edges drift smoothly turn-to-turn so retrieval and prompt assembly always see current values; long-form `summary` only churns at compression boundaries.
+
+**Per turn (cheap, small):** the post-utterance state-update classifier on each present entity produces a delta:
+
+- `affinity_delta` (signed integer, ±0–3 on a 0-100 scale; conservative).
+- `trust_delta` (same shape).
+- `knowledge_facts: [string]` — new things this entity now knows about the target.
+- `last_interaction` is bumped to the current chat-id + chat-clock unconditionally.
+
+**Per scene close (expensive, summarizing):**
+
+- Edge `summary` rewritten by the classifier from the per-POV summary of the closing scene plus the prior summary. Aggregates the meaningful arc.
+- "Shared private moments" list appended to (one entry per closed scene at significance ≥ 2).
+
+Every edge mutation goes into the event log as an `edge_update` event so rollback works.
+
 ## 4. Data Model (top-level entities)

 - **Bot** — top-level persistent unit. Has identity (immutable per session), state (mood/goals/status), per-bot clock, kickoff spec.
@@ -116,13 +150,30 @@ When BotB guests in BotA's chat, the scene runs on **BotA's** chat clock; memori

 Bots are top-level. Bot count in the system is unbounded. Per-scene cap is 2 bots (you + 1 host + optional 1 guest, or you + 1 host with no guest).

+### 4.3 Chat clock format (storage vs display)
+
+- **Stored** as a precise UTC datetime (ISO 8601). Initialized at kickoff to a sensible default (the kickoff time mentioned in prose, or a "now in-fiction" fallback).
+- **Displayed** in the chat header as a friendly relative format: "Tuesday evening, 9:14pm" or "Day 3 — late afternoon". Helps preserve fictional feel. Computed from the stored datetime + the chat's narrative anchor (the in-fiction date corresponding to "Day 1").
+- The drawer shows the precise stored datetime alongside the friendly form.
+- Time skips advance the stored datetime; the friendly format re-renders.
+- This split lets the system do precise time arithmetic (last-interaction calculations, Phase 3 event triggers) while showing humane time to the user.
+
 ## 5. Authoring

 Authoring is structured, done in a form-based UI. A bot is created once, then edited through reset (full wipe) or by amending immutable identity fields directly.

 ### 5.1 Authored fields per bot

- **Identity (immutable per session):** name, persona paragraph, voice samples (1–3 short prose samples in the bot's voice), trait list, backstory.
+**Identity (immutable per session):**
+
+- **Name** (required).
+- **Persona paragraph** — free-form prose; goes into must-include identity block.
+- **Voice samples** — 1–3 short prose passages in the bot's voice. Stored joined with `---` separators. Injected into the speaker prompt as a must-include block: *"Voice reference. Match this register, vocabulary, and rhythm: {samples}"*. Always sent in full in v1 (rotation strategy is Phase 2 fallback if budget pressure forces it).
+- **Trait list** — free-form list of comma- or newline-separated phrases ("introverted, quick to anger, terrible at small talk, loves cats"). 3–15 items typical, no hard cap. Stored as `traits: list[str]`. Rendered in prompt as `Traits: …`. Not Big Five / MBTI — free-form anchors voice without biasing the model.
+- **Backstory** — free-form prose, single string. Soft target 100–500 words; no hard cap, but trim tier downgrades long backstories under budget pressure.
+
+**Per-bot relationship & first encounter:**
+
 - **Initial relationship to you:** free-form prose ("BotA is my coworker; we've worked together for two years; she has a crush on me she hasn't admitted"). Parsed once into seeded `you ↔ bot` edge content on first run.
 - **Kickoff scene:** free-form prose describing the first encounter ("you stay late at the office; only you and BotA are there; she's at her desk pretending to work"). Parsed on first init into structured container, activity, seed edge content, and initial scene state. The user confirms or edits the parsed result before play begins.

@@ -141,6 +192,14 @@ Reset is a per-bot action with hard confirmation (type the bot's name).
 - Preserves: identity, initial edge seed, kickoff scene.
 - After reset, the chat **sits ready** — kickoff does not auto-play. The next user message triggers kickoff.

+### 5.4 "You" entity authoring (one-time, shared across all chats)
+
+- **Name** (required).
+- **Pronouns** (optional, freeform string; if empty, model infers).
+- **Persona blob** (optional, recommended) — short paragraph for context ("30, software engineer, lives alone, dry sense of humor"). Provides backdrop the bots can use without overriding per-bot framing.
+
+No voice samples for "you" — the user provides their own dialogue, the bot doesn't need to mimic the user's voice. Per-bot specifics about how each bot knows you stay in the per-bot "initial relationship to you" field (§5.1).
+
 ## 6. Play Loop

 ### 6.1 Input convention (mixed prose, novel-style)
@@ -181,10 +240,42 @@ A collapsible right-side drawer. **Closed by default.** When open, shows for the
 - Current scene + container.
 - Activity record per present entity.
 - Edges (host ↔ you, host ↔ guest if any).
- Recent witnessed memories from the host's POV.
+- Recent witnessed memories from the host's POV (with significance markers `·` `•` `★` `★★`).
+- **Pinned memories** in their own section with a `n / 8` counter and an unpin affordance per row.
 - Active threads and currently active events (Phase 3).
+- Snapshots panel (Phase 4 surface; data exists from Phase 1).

-Read-only by default; each row has an edit affordance for surgically fixing things the LLM got wrong (full edit surface lands progressively across phases).
+**v1 editable fields** (highest-value rescues for LLM errors):
+
+- **Activity record**: `action.verb`, `attention`, `posture` (text fields). Fixes "leaning on the kitchen counter while in a car".
+- **Edges**: `affinity`, `trust` (sliders 0–100), `summary` (textarea), `knowledge_facts` (add/remove list). Fixes misread moments.
+- **Memory**: `pov_summary` (textarea), `significance` (dropdown 0–3), pin toggle. Fixes wrong summaries.
+
+**Read-only in v1** (Phase 4 makes editable):
+
+- Container properties.
+- Identity fields (immutable per session by design — change via reset).
+- Witness flags (rewriting these silently changes continuity logic).
+- Structural fields (`owner_id`, `scene_id`, `chat_id`).
+
+Every drawer edit goes through the event log as a `manual_edit` event capturing the prior value, so it is fully reversible.
+
+### 6.5 Activity record specifics
+
+The activity record per entity carries:
+
+- `current_action.verb` — **open string** ("driving", "writing an email", "lying in bed reading"). No enum.
+- `current_action.interruptible` — bool. Classifier-extracted alongside the verb on each activity update.
+- `current_action.required_attention` — `low | medium | high`. Classifier-extracted.
+- `current_action.expected_duration` — text estimate ("a few minutes", "an hour", "ongoing"). Classifier-extracted.
+- `current_action.started_at` — chat-clock snapshot when the action begins.
+- `posture` — `standing | sitting | lying | …` (open string; classifier-extracted).
+- `position` — `{container_id, slot_name}`.
+- `holding` — list of items (open strings).
+- `attention` — **optional free-form string** ("the road", "her phone", "you", "the document"). Set only when prose makes attention explicit; left empty otherwise. When empty, prompt assembly omits the field rather than rendering "attention: unknown".
+- `status` — `{conscious, sober, injured, …}` (open dict).
+
+The activity block in the prompt renders verb + structured properties + posture + attention (if set) so the narrative model has both the natural-language verb and the structured constraints to dramatize ("BotA is driving — high attention, not interruptible, attention on the road").

 ## 7. Scene Lifecycle

@@ -215,6 +306,26 @@ False positives are the bigger risk than false negatives, so the auto-detector e
 5. Raw dialogue archived to cold storage.
 6. New active scene is opened (resume or fresh).

+### 7.4 Container authoring (parse-and-extend)
+
+- **Kickoff parse** creates the initial container for a bot's chat from the kickoff prose ("you stay late at the office" → container `office`, type `workplace`, public, slots auto-named from context).
+- **Transitions during play** ("we drove to the park", "let me grab a drink from the kitchen") are detected by the scene-close classifier; if a container with that name doesn't exist yet in this chat, it's created on the fly with classifier-inferred defaults; if it does, it's reused.
+- **Drawer** shows the current container with all fields editable (Phase 4) or read-only with manual creation (v1). New containers can be pre-authored in the drawer if you want.
+- **Schema**: `containers(id, chat_id, name, type, properties_json, parent_id)` where `properties_json` holds `{moving: bool, public: bool, audible_range: text, slots: [{name, occupant_id?}]}`.
+- **Containers are scoped per chat.** BotA's "apartment" and BotC's "apartment" are distinct records — no name collision.
+
+### 7.5 Guest leaves mid-scene
+
+Adding a guest then having them leave is a normal flow. What happens:
+
+- **The user removes the guest** via the drawer ("Remove BotB from scene"), or BotB exits in prose ("BotB grabs her coat and heads out") — the classifier detects exit on the next turn and prompts for confirmation.
+- **The current scene closes** at the moment of guest exit (this counts as a hard close signal). Per-POV summaries are written for all three witnesses **including BotB**. Edges update.
+- **A new scene immediately opens** with just you + host bot, in the same container, with carry-over activity. No need to re-narrate "now we're alone".
+- **Witness flags from the closed scene** stay `[1, host, guest]` for the period BotB was present — that data is permanent and travels with the memory.
+- **The group node** persists for the chat (its content is updated on close, not deleted), available for future co-presences in this chat. If BotB never returns, the group node just sits unused.
+- **BotB's chat clock** is unchanged — it remains wherever it was when last visited (per §4.1).
+- **BotB's memories of the scene** are written to BotB's memory store, available next time you talk to BotB alone.
+
 ## 8. Memory Retrieval

 ### 8.1 Always-loaded (no retrieval cost)
@@ -238,6 +349,15 @@ A bot **cannot** retrieve memories whose witness bit for them is `0`. Period. Th

 A bot's memory store contains memories from any chat the bot has been in (host or guest). All are retrievable. Bots may reference cross-chat events naturally; precise cross-chat time arithmetic is not attempted.

+### 8.5 Pinning
+
+- User pins via the drawer (pin icon next to a memory row).
+- Pinned memories are **always-loaded** for the speaker (no retrieval cost).
+- **Soft cap: 8 pins per bot.**
+- Pivotal-significance (score 3) memories are **auto-pinned** for the witness whose POV they're in.
+- When over cap, the **oldest auto-pin** is unpinned (not deleted — just removed from the pinned set; still findable via retrieval). **Manual pins are never auto-evicted** — user must unpin manually.
+- Drawer shows pinned memories in their own section with a `n / 8` counter and an unpin affordance per row.
+
 ## 9. Time, Skips, Events (Phase 3 surface)

 - Each chat has its own clock; advances **only** on explicit user skip commands within that chat.
@@ -249,25 +369,98 @@ Phase 1 has no skips and no events. Time is set at kickoff and stays put unless

 ## 10. Rollback, Regenerate, Reset

- **Rewind to here** — button on every turn. Truncates event log past that turn; rebuilds projection. Confirmation modal shows turn count + scene transitions affected. Always snapshots pre-rewind for "undo rewind".
- **Regenerate this turn** — button on the latest bot turn. **Edit-then-regenerate**: the user may edit their preceding turn before re-running. Replaces the old `assistant_turn` event with a new one carrying the new outcome. Downstream classifier passes (state updates, significance) re-run on the new output.
- **Reset bot** — full wipe with hard confirm (type bot name). Behavior detailed in §5.3.
- **Branching, hide-from-view, surgical delete + cascade with impact preview** — Phase 4. Mechanically supported by event sourcing already; no UI yet.
+### 10.1 Rewind
+
+- Button on every turn. Truncates event log past that turn; rebuilds projection.
+- **Pre-rewind snapshot** is always taken automatically before truncating. Stored in `data/snapshots/rewind/`. Retained 14 days then pruned.
+- **Confirmation modal** shows an impact preview:
+
+  ```
+  Rewind to turn 47?
+
+  This will remove:
+    • 12 messages (turns 48–59)
+    • 1 scene transition (drive to park)
+    • 2 edge updates (BotA → You, Group)
+    • 3 memories from BotA's store
+    • 1 fired event (arrived at park)
+    • 1 manual edit (BotA affinity)
+
+  A pre-rewind snapshot will be saved automatically.
+
+  [ Cancel ]   [ Rewind ]
+  ```
+
+- After successful rewind, a 30-second toast appears: **"Rewound 12 turns. [Undo]"** — clicking restores from the pre-rewind snapshot. After the toast dismisses, the snapshot is still on disk and reachable from the drawer's snapshots panel (Phase 4 UI; data is there from Phase 1).
+- "Rewind, keep current as branch" is Phase 4.
+
+### 10.2 Regenerate (inline, not modal)
+
+Clicking **Regenerate** on the latest bot turn:
+
+1. Scrolls to your last user turn and puts it into an inline edit state (textarea pre-filled with your prose).
+2. The bot's response below shows a faded "regenerating…" placeholder.
+3. Submit button is **Regenerate** (not Send). Hitting it:
+   - If you edited: appends a `user_turn_edit` event capturing the new prose, then a new `assistant_turn` event with the new generated response.
+   - If you didn't edit: appends only a new `assistant_turn` event.
+4. The previous `assistant_turn` event is **superseded**, not deleted — kept in the log with a `superseded_by` pointer so it's recoverable. Display hides it.
+5. Downstream classifier passes (state-update, significance) re-run on the new response.
+6. **[ Cancel ]** reverts to the original (no event written).
+
+### 10.3 Reset bot
+
+Full wipe with hard confirm (type bot name). Behavior detailed in §5.3.
+
+### 10.4 Snapshot frequency & retention
+
+- **Periodic snapshot**: every **100 events** OR every **30 minutes** of activity, whichever first. Stored in `data/snapshots/periodic/`.
+- **Retention**: keep last 5 periodic snapshots. Pre-rewind snapshots retained 14 days.
+- **Cold-load behavior**: replay starts from the most recent periodic snapshot, then applies events forward. Bounds replay cost on app start.
+
+### 10.5 Deferred to Phase 4
+
+Branching with UI, hide-from-view soft delete, surgical delete + cascade with impact preview. Mechanically supported by event sourcing already; no v1 UI surface.

 ## 11. Compression & Promotion

- **Significance pass** — classifier call after each turn (queued, async) tags the turn 0–3.
- **Per-POV summaries** — written per witness when a scene closes. Different details, different interpretations. No omniscient narration.
- **Promotion rules (the "picnic basket" rule):**
-  - Object acquired → entity inventory.
-  - Knowledge gained → relevant edge's `knowledge`.
-  - Relationship change → edge `summary`.
-  - **Everything else stays in the closed event/scene record.** Surfaces only on explicit recall.
- **Compression tiers:**
-  - Last scene: full dialogue retained.
-  - Recent scenes: per-POV summary + key quotes.
-  - Older scenes: per-POV summary only.
-  - Distant past: rolled into edge summaries.
+### 11.1 Significance rubric
+
+Classifier call after each turn (queued, async) tags the turn 0–3. Scene significance is the max turn-significance within the scene.
+
+| Score | Name | Definition | Examples |
+|------:|------|------------|----------|
+| **0** | Routine | Banter, small talk, ordinary action. Forgettable on its own; aggregates only via edge stats. | "Hi, how was your day?" / "Fine, you?" |
+| **1** | Notable | A specific detail, opinion, or beat worth remembering but not arc-changing. Default for non-trivial dialogue. | BotA mentions a band she likes; you discover BotB hates a food. |
+| **2** | Significant | A scene-level moment — meaningful confession, real disagreement, a date, a confided secret. | First date; BotA tells you about her sister; an argument. |
+| **3** | Pivotal | A relationship-altering event. Updates edge `summary` and (often) `affinity` substantially. Always auto-pinned. | First kiss; betrayal; "I love you"; learning a defining secret. |
+
+**Where each level is used:**
+
+- **Retrieval ranking**: significance multiplier applied as `score × constant` to FTS / vector rank.
+- **Compression**: scenes with max-turn-significance ≥ 2 retain key quotes; ≤ 1 collapse fully into the per-POV summary.
+- **Exports**: scene-close JSON written to `data/exports/` when scene significance ≥ 2.
+- **Auto-pin**: turns scored 3 are auto-pinned for each witness whose POV they're in.
+- **UI hint**: drawer renders score as `·`, `•`, `★`, `★★`.
+
+**Tie-breakers**: turn significance is the max across emotional, factual, and relational facets; the classifier returns the max. Conservative bias — when uncertain, score lower.
+
+### 11.2 Per-POV summaries
+
+Written per witness when a scene closes. Different details, different interpretations. **No omniscient narration.**
+
+### 11.3 Promotion rules ("picnic basket" rule)
+
+- Object acquired → entity inventory.
+- Knowledge gained → relevant edge's `knowledge`.
+- Relationship change → edge `summary`.
+- **Everything else stays in the closed event/scene record.** Surfaces only on explicit recall.
+
+### 11.4 Compression tiers
+
+- Last scene: full dialogue retained.
+- Recent scenes: per-POV summary + key quotes (only if significance ≥ 2; else summary only).
+- Older scenes: per-POV summary only.
+- Distant past: rolled into edge summaries.

 ## 12. Persistence & Ops (v1 defaults)

@@ -333,19 +526,18 @@ Phase 1 has no skips and no events. Time is set at kickoff and stays put unless

 ## 14. Open / Deferred Decisions

-Resolved by this brainstorm (now reflected in §3 / §6 / §12 above):
- ~~Classifier model name~~ → `NousResearch/Hermes-3-Llama-3.1-8B`, with documented fallback chain.
- ~~Token budget tier strategy~~ → §3.2 (8K / 6K narrative, 4K classifier; must / should / nice tiers).
- ~~UI framework~~ → FastAPI + HTMX + SSE, multi-tab sync as a Phase 1 requirement (§3.1).
- ~~OOC marker~~ → `((double parens))`, configurable.
- ~~DB file location~~ → project-folder `<repo>/data/` tree (§12).
+All round-1 and round-2 brainstorm decisions are resolved and folded into §3–§12 / §16. Honest list of what is still deferred after this brainstorm:

-Still deferred:
 - **Embedding model** (Phase 4 — pick whatever's cheap and good enough on Featherless or local at the time).
- **sqlite-vss vs sqlite-vec** (Phase 4 — pick based on the projects' state at the time).
- **Significance scoring rubric** — what does 0/1/2/3 mean? Drafted during Phase 1 against real scenes.
- **Activity-record action verbs** — open vocabulary or constrained list? Decided during Phase 1 implementation.
- **Drawer edit-affordance UX** — which fields editable in v1, which slip to Phase 1.5 / Phase 4.
+- **sqlite-vss vs sqlite-vec** (Phase 4 — pick based on each project's state at the time).
+- **Exact prompt templates** for narrative + each classifier job — drafted against real prompts during Phase 1.
+- **Schema column-level types and FK details** — finalized during Phase 1 schema implementation.
+- **Token-counting accuracy** — accept ~5% drift from `tiktoken` cl100k vs actual Mistral / Llama tokenizers; revisit if drift causes real budget overruns.
+- **"Snapshots" panel UX** in the drawer — Phase 4. Data is written from Phase 1, read-back UI lands later.
+- **In-fiction time auto-advance during a scene** — Phase 1 freezes the chat clock between user-initiated skips. If this feels stale during Phase 1 play, revisit with model-narrated parsing in Phase 3 (§9).
+- **Search across chat history** — out of scope for v1. Phase 4 if needed.
+- **Avatars / portraits** — out of scope (multimodality is deferred indefinitely).
+- **Performance targets** — measured against real prompts in Phase 1; no preset SLOs.

 ## 15. Non-Negotiables (rules every implementer must respect)

@@ -358,6 +550,58 @@ Still deferred:
 7. **Streaming on the inference path; non-blocking bookkeeping** runs while the LLM streams.
 8. **No extra services.** SQLite + a process. Push back on suggestions to add infrastructure.

+## 16. UI Shape & Flow
+
+### 16.1 Top-level navigation
+
+Persistent left rail with three sections:
+
+- **Chats** (default). List of every authored bot, each row showing: bot name, last message snippet, last-played-at (real time), unread/idle indicator, current chat clock (in-fiction time). Click → opens that chat. Single active chat per browser tab.
+- **Bots**. Library view. List of authored bots with thumbnails (initials/avatar later), edit / reset / delete actions per bot. "New bot" button at the top.
+- **Settings**. Single screen: Featherless API key, model overrides, OOC marker, K, token budgets, "you" entity authoring, theme.
+
+Top-of-rail: "+" button creates a new bot (jumps to authoring); after author + kickoff, you land in the new chat.
+
+### 16.2 First-run experience
+
+On fresh install (empty DB):
+
+1. App boots → **"Set up your profile"** screen — fills in the "you" entity (name, pronouns, persona blob).
+2. After save → **"Add your first bot"** — bot authoring form with inline guides.
+3. After save → **kickoff parse-and-confirm** — orchestrator parses kickoff prose, displays the parsed structured form (container, activity, seed edges) with edit affordances; user confirms.
+4. Lands in the chat, ready for first turn.
+
+Step 1 is skip-able (revisit in Settings); step 2 is required (no chats without a bot).
+
+### 16.3 Display formatting
+
+- **Bot output** rendered as **lightweight markdown** — paragraphs, *italics*, **bold**, blockquotes. No headings or code blocks (foreign to RP prose).
+- **Action segments** (`*walks over*`) rendered as italics; dialogue rendered plain.
+- **User input** uses the same render rules. The textarea is plain — no live preview.
+- **OOC `((parens))`**: shown in the transcript with italic + dimmed + smaller font, set off from surrounding prose. Always visible to you (you should see your own meta-commentary). Stripped from the prompt the bot sees.
+- **Speaker labels**: bot turns prefixed with the bot's name in bold; your turns prefixed with your "you" name. Tight spacing.
+- **Stream rendering**: tokens append as they arrive; markdown re-rendered on each chunk. Cursor indicator at the trailing edge while streaming.
+
+### 16.4 Streaming UX
+
+- **Typing indicator**: the bot's row appears immediately with name + "…" pulse, then tokens fill.
+- **Stop button** on the streaming bot row halts mid-stream. On stop:
+  - Partial response is committed as a normal `assistant_turn` event with `truncated: true`.
+  - Downstream classifier passes still run on the partial text (state-update, significance) — partial is fine, the bot really did "say up to here".
+  - You can immediately type your next turn or click Regenerate to redo the whole response.
+- **Send-while-streaming**: disabled. Input box is locked during stream. Stop first.
+- **Mid-stream disconnect** (Featherless drops, network blip, page refresh): treated as an interrupt with `truncated: true`, partial text committed if any was received. UI surfaces "connection lost — partial response saved" banner with a Regenerate button.
+- **Multi-tab streaming**: tokens stream to all subscribed tabs simultaneously via the per-chat SSE channel (§3.1). Stop from any tab interrupts for everyone.
+
+### 16.5 Error UX surface
+
+- **Featherless API down / unauthorized / out of credits**: failed turn shows an error banner inline ("Featherless: 401 unauthorized" / "rate limited" / "service unavailable") with a Retry button. **No event committed** on hard failure — the turn never happened.
+- **Featherless slow (> 30s before first token)**: warning banner — "Slow response — still waiting…" with the same Stop button. No automatic abort.
+- **Classifier failure (after fallback ran)**: silent — fallback values used (per §3.3). Logged to `classifier_failures` table. Drawer dev panel surfaces a count. **No user-facing notification** in normal play.
+- **DB write failure** (rare; disk full, file lock): hard error modal — "Couldn't save your turn — fix and retry". Orchestrator does not advance state.
+- **Schema-migration failure on startup**: app refuses to launch, prints the error and the path to the DB. No automatic repair.
+- **Missing config / first-run with empty config**: redirected to Settings before first chat opens.
+
 ---

 ## Appendix A — Decisions Log (this brainstorm)
@@ -385,3 +629,25 @@ Still deferred:
 | 19 | Token budgets | Narrative 8K hard / 6K soft; classifier 4K hard. Must/Should/Nice tiers per §3.2 |
 | 20 | OOC marker | `((double parens))`, configurable |
 | 21 | DB location | Project-folder `<repo>/data/` tree (DB, backups, snapshots, exports, config). Gitignored. `CHAT_DB_PATH` env var honored |
+| 22 | Significance rubric | 0=Routine, 1=Notable, 2=Significant, 3=Pivotal. Uses across retrieval, compression, exports, auto-pin, drawer. Score-3 turns auto-pinned per witness |
+| 23 | Edge update granularity | Per-turn deltas (affinity, trust, knowledge_facts, last_interaction); per-scene-close summary rewrite |
+| 24 | Classifier failure handling | Pydantic-constrained → 1 retry → schema-default fallback. Refusal triggers fallback model swap for that call. 10s timeout. Failures logged |
+| 25 | "You" entity | Name (req) + pronouns (opt) + persona blob (opt). Per-bot relationship handles bot-specific framing |
+| 26 | Voice samples | 1-3 samples per bot; always must-include in v1 |
+| 27 | Trait list | Free-form list of phrases (3-15 typical); not Big Five / MBTI |
+| 28 | Backstory | Free-form prose, 100-500 words target |
+| 29 | Pinning | Soft cap 8 / bot. Score-3 auto-pins. Manual pins never auto-evicted. Drawer surface with `n/8` counter |
+| 30 | Containers | Parse-and-extend: kickoff parse seeds initial container; transitions create new; per-chat scoped |
+| 31 | Activity verbs | Open string for `verb`; classifier extracts `interruptible`, `required_attention`, `expected_duration` alongside |
+| 32 | Attention | Optional free-form string; omitted from prompt when empty |
+| 33 | Drawer edit cut (v1) | Editable: activity (action/attention/posture), edges (affinity/trust/summary/knowledge_facts), memory (pov_summary/significance/pin). Read-only: container, identity, witness, structural |
+| 34 | Snapshots | Periodic every 100 events / 30 min; pre-rewind always; 5 periodic retained, pre-rewind 14 days |
+| 35 | Rewind UX | Modal with structured impact preview; pre-rewind snapshot auto-saved; 30s undo toast |
+| 36 | Regenerate UX | Inline (no modal). Edit-then-regenerate. Old `assistant_turn` superseded, not deleted |
+| 37 | Top-level nav | Three-section left rail: Chats / Bots / Settings |
+| 38 | First-run | You-profile → first-bot author → kickoff parse-and-confirm → chat |
+| 39 | Display formatting | Lightweight markdown; `*action*` italic; OOC dimmed/italic/smaller; speaker labels bold |
+| 40 | Chat clock format | Stored ISO 8601 UTC datetime; displayed friendly relative ("Tuesday evening, 9:14pm") |
+| 41 | Streaming UX | Typing indicator, Stop button, Send disabled mid-stream, multi-tab sync via SSE, mid-stream disconnect = truncated commit |
+| 42 | Error UX | Featherless errors inline w/ Retry; classifier fails silent w/ fallback; DB write fails modal-blocking; schema migration fails launch-blocking |
+| 43 | Guest leaves | Auto scene-close on guest exit; per-POV summaries for all 3 incl. guest; new scene immediately opens with you+host; group node persists for chat |