From 5869f1c5ce3681fd4c7afcdd04692bea9d5bd45b Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Sun, 26 Apr 2026 10:56:51 -0400 Subject: [PATCH] docs: lock remaining v1 design decisions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolves the open/deferred decisions from the v1 requirements brainstorm: runtime stack, classifier model, token budgets, OOC marker, data layout. - Runtime: FastAPI + HTMX + SSE (multi-tab sync is a Phase 1 requirement, not a polish item). 127.0.0.1 only, no auth in v1. - Classifier model: NousResearch/Hermes-3-Llama-3.1-8B with documented fallback chain (dolphin-2.9.4-llama3-8b, Meta-Llama-3.1-8B-abliterated). - Token budgets: 8K hard / 6K soft for narrative, 4K hard for classifier; Must/Should/Nice trimming tiers spelled out in §3.2. - OOC marker locked to ((double parens)), configurable. - All runtime data lives under /data/ (DB, backups, snapshots, exports, config). Tree is gitignored. CHAT_DB_PATH env var honored. CLAUDE.md and the requirements doc updated to match. Decisions log in the requirements doc appendix extended with the new locks (#17–21). --- .gitignore | 3 + CLAUDE.md | 15 +++- .../2026-04-26-v1-requirements-design.md | 78 ++++++++++++++++--- 3 files changed, 83 insertions(+), 13 deletions(-) diff --git a/.gitignore b/.gitignore index e43b0f9..92172f9 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,4 @@ .DS_Store + +# v1 runtime data (DB, backups, snapshots, exports, config with secrets) +data/ diff --git a/CLAUDE.md b/CLAUDE.md index a52689a..972a037 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -23,9 +23,22 @@ The 3-entity cap is load-bearing: it makes the relationship graph fully enumerab ## Architecture - **Mac (always-on)**: web UI, orchestrator, persistence, event queue, retrieval, prompt construction, all state. -- **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations (cloud API, rented GPU, local MLX/llama.cpp) behind one interface. The orchestrator never knows which. +- **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations behind one interface. The orchestrator never knows which. - Streaming required for UX. +## Runtime stack (locked for v1) + +- **Backend**: Python 3.11+ with **FastAPI**. +- **Frontend**: server-rendered HTML + **HTMX** + minimal vanilla JS/CSS. No JS build chain. +- **Live updates**: SSE per chat. Per-chat `asyncio.Queue` pub/sub. Multi-tab sync is a Phase 1 requirement — two browser tabs on the same chat must mirror each other live (streamed tokens, drawer state, edge updates). +- **Inference backend**: **Featherless** (OpenAI-compatible API). + - `narrative_model` = `dphn/Dolphin-Mistral-24B-Venice-Edition` (32K ctx, uncensored). + - `classifier_model` = `NousResearch/Hermes-3-Llama-3.1-8B` (128K ctx, uncensored, structured-output reliable). Fallbacks: `cognitivecomputations/dolphin-2.9.4-llama3-8b` → `mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated`. +- **Token budgets**: narrative 8K hard / 6K soft; classifier 4K hard. Trim tiers must / should / nice — never trim must-include. +- **OOC marker**: `((double parens))` (configurable). +- **Data layout**: everything under `/data/` — `chat.db`, `backups/`, `snapshots/`, `exports/`, `config.toml`. The whole tree is `.gitignore`d. `CHAT_DB_PATH` env var honored as override. +- **Auth**: bind to `127.0.0.1` only in v1. No auth. + ## Core concepts (vocabulary) - **Entity**: `you | botA | botB`. Has identity (immutable), state (mood/goals/status), activity, per-POV memory. diff --git a/docs/plans/2026-04-26-v1-requirements-design.md b/docs/plans/2026-04-26-v1-requirements-design.md index 0be5d9d..b236f89 100644 --- a/docs/plans/2026-04-26-v1-requirements-design.md +++ b/docs/plans/2026-04-26-v1-requirements-design.md @@ -23,8 +23,9 @@ The LLM is treated as a **renderer** for structured world state, not as the stat - **One chat per bot.** A second bot can be added as a *guest* into any chat. Hard cap: **2 bots in any scene**. - Explicit / mature content allowed. - **Featherless** as the LLM backend over its OpenAI-compatible API. Two model slots: - - `narrative_model` — Dolphin-Mistral-24B-Venice (uncensored, narrative-grade). - - `classifier_model` — small (~3B-class), TBD at Phase 1 start. Used for parsing, significance, interjection, scene-close detection, state-update passes. + - `narrative_model` — `dphn/Dolphin-Mistral-24B-Venice-Edition` (uncensored, narrative-grade). 32K context. + - `classifier_model` — `NousResearch/Hermes-3-Llama-3.1-8B` (uncensored, tuned for tool use / structured output). 128K context. Fallback chain if it underperforms on JSON: `cognitivecomputations/dolphin-2.9.4-llama3-8b` → `mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated`. + - Classifier is used for: turn parsing (dialogue/action/ooc), kickoff prose parsing, scene-close detection, interjection decisions, significance scoring, state-update extraction, jump-skip memory synthesis. ### 2.2 Out of scope (v1) @@ -57,6 +58,41 @@ The orchestrator never knows which model is in use — only `generate(prompt, pa API key handling: keys live in a local config file outside the repository. **Never** commit a key to the repo, paste in chat logs, or include in exports. +### 3.1 Runtime stack + +- **Backend**: Python 3.11+ with **FastAPI** as the HTTP server. +- **Frontend**: server-rendered HTML + **HTMX** + minimal vanilla JS/CSS. No JS build chain. +- **Live updates**: Server-Sent Events (SSE) per chat. Server keeps a per-chat in-process pub/sub channel (an `asyncio.Queue` per chat_id). Every browser tab on `/chats/` opens an SSE connection to `/chats//events`. State changes (new turn, streamed tokens, drawer state, edge updates, scene close) publish to the channel; all subscribed tabs receive the event and HTMX swaps the relevant DOM region. +- **Multi-tab sync** is a Phase 1 requirement, not a polish item. Two browser tabs open to the same chat must mirror each other in real time. Implications: + - In-progress typing is tab-local until submit (no collaborative input in v1). + - On reconnect/refresh, the server first sends a "current state" snapshot, then resumes streaming. + - The same architecture trivially supports a phone or tablet on the LAN later — bind to `0.0.0.0` + add a shared-secret token if/when desired. Default is `127.0.0.1`, no auth. + +### 3.2 Token budgets and trimming tiers + +Token accounting via `tiktoken` with the closest cl100k approximation. Mistral and Llama tokenizers diverge ~5%; we accept the drift. + +- **Narrative prompt**: 8K hard ceiling, 6K soft target. Leaves ~2-4K headroom for streamed output and avoids long-context performance cliffs. Plenty for our prompt shape. +- **Classifier prompt**: 4K hard ceiling. Most calls are well under 1K. + +When the assembled prompt exceeds the soft target, trim in this order — never trim must-include: + +- **MUST-include** (always present): + - System message + speaker identity + - Speaker's edge to the addressee + - Activity snapshot for all present entities + - Current scene description + - Last 4 turns of dialogue +- **SHOULD-include** (trim when over budget): + - Other edges of the speaker (e.g. speaker → other present) + - Group node summary (when applicable) + - Active threads + - Currently active events + props +- **NICE-include** (trim first): + - Retrieved memories beyond top-2 (drop K=4 to K=2) + - Dialogue turns beyond the last 4 (replace older turns with a one-line summary) + - Per-POV summary of the previous scene + ## 4. Data Model (top-level entities) - **Bot** — top-level persistent unit. Has identity (immutable per session), state (mood/goals/status), per-bot clock, kickoff spec. @@ -113,7 +149,7 @@ A turn is free-form prose with conventional markers: - `*walks over*` — action. - Quoted or bare text — dialogue. -- `((double parens))` — out-of-character commentary or meta-instruction. Flagged but not sent to the bot. (Default; configurable before play begins.) +- `((double parens))` — out-of-character commentary or meta-instruction. Flagged but not sent to the bot. (Default; stored as a config field; the user may change it before play begins.) A small classifier call splits the turn into segments tagged `dialogue | action | ooc`. Action segments update the user's activity record. @@ -236,9 +272,16 @@ Phase 1 has no skips and no events. Time is set at kickoff and stays put unless ## 12. Persistence & Ops (v1 defaults) - SQLite WAL mode, foreign keys on, transactional turns. -- Single DB file. Default path TBD (likely `~/Library/Application Support/chat/chat.db`). +- **Project-folder layout** (DB lives inside the repo, gitignored): + - DB: `/data/chat.db` + - Backups: `/data/backups/` (timestamped copies) + - Pre-rewind snapshots: `/data/snapshots/` + - Significant-scene JSON exports: `/data/exports/` + - Config: `/data/config.toml` (holds Featherless API key, model names, OOC marker, K, budget, etc. Gitignored.) + - The entire `data/` tree is in `.gitignore` so secrets and state never get committed. + - `CHAT_DB_PATH` env var honored as an override if you want to point at a different file (e.g., a backup or a sibling repo's data). - **Auto-backup** nightly via launchd. Timestamped copies. Last 14 retained. Pre-rewind snapshots are separate and not pruned. -- **Significant-scene JSON exports** written to a sibling folder when scenes close at significance ≥ 2. +- **Significant-scene JSON exports** written to `data/exports/` when scenes close at significance ≥ 2. - Schema versioned in a `meta` table; migrations applied on startup. ## 13. Phase Cut @@ -290,13 +333,19 @@ Phase 1 has no skips and no events. Time is set at kickoff and stays put unless ## 14. Open / Deferred Decisions -- Exact small classifier model name on Featherless (pick at start of Phase 1: cheapest model that's good enough at structured-output classification). -- Token budget tier strategy (must-include / should-include / nice-to-include) — designed against real prompts during Phase 1. -- UI framework — TBD; local web app is the default direction. -- OOC marker (`((parens))` proposed as default; user may change before play begins). -- DB file location. -- Embedding model choice (Phase 4). -- sqlite-vss vs sqlite-vec (Phase 4). +Resolved by this brainstorm (now reflected in §3 / §6 / §12 above): +- ~~Classifier model name~~ → `NousResearch/Hermes-3-Llama-3.1-8B`, with documented fallback chain. +- ~~Token budget tier strategy~~ → §3.2 (8K / 6K narrative, 4K classifier; must / should / nice tiers). +- ~~UI framework~~ → FastAPI + HTMX + SSE, multi-tab sync as a Phase 1 requirement (§3.1). +- ~~OOC marker~~ → `((double parens))`, configurable. +- ~~DB file location~~ → project-folder `/data/` tree (§12). + +Still deferred: +- **Embedding model** (Phase 4 — pick whatever's cheap and good enough on Featherless or local at the time). +- **sqlite-vss vs sqlite-vec** (Phase 4 — pick based on the projects' state at the time). +- **Significance scoring rubric** — what does 0/1/2/3 mean? Drafted during Phase 1 against real scenes. +- **Activity-record action verbs** — open vocabulary or constrained list? Decided during Phase 1 implementation. +- **Drawer edit-affordance UX** — which fields editable in v1, which slip to Phase 1.5 / Phase 4. ## 15. Non-Negotiables (rules every implementer must respect) @@ -331,3 +380,8 @@ Phase 1 has no skips and no events. Time is set at kickoff and stays put unless | 14 | Model strategy | Small classifier model + large narrative model | | 15 | Reset | Full wipe + hard confirm; chat sits ready for kickoff | | 16 | Rollback | Rewind + regenerate (with edit-then-regenerate) | +| 17 | UI framework | FastAPI + HTMX + SSE; multi-tab sync as a Phase 1 requirement | +| 18 | Classifier model | `NousResearch/Hermes-3-Llama-3.1-8B` (fallbacks: `dolphin-2.9.4-llama3-8b`, `Meta-Llama-3.1-8B-Instruct-abliterated`) | +| 19 | Token budgets | Narrative 8K hard / 6K soft; classifier 4K hard. Must/Should/Nice tiers per §3.2 | +| 20 | OOC marker | `((double parens))`, configurable | +| 21 | DB location | Project-folder `/data/` tree (DB, backups, snapshots, exports, config). Gitignored. `CHAT_DB_PATH` env var honored |