docs: lock remaining v1 design decisions
Resolves the open/deferred decisions from the v1 requirements brainstorm: runtime stack, classifier model, token budgets, OOC marker, data layout. - Runtime: FastAPI + HTMX + SSE (multi-tab sync is a Phase 1 requirement, not a polish item). 127.0.0.1 only, no auth in v1. - Classifier model: NousResearch/Hermes-3-Llama-3.1-8B with documented fallback chain (dolphin-2.9.4-llama3-8b, Meta-Llama-3.1-8B-abliterated). - Token budgets: 8K hard / 6K soft for narrative, 4K hard for classifier; Must/Should/Nice trimming tiers spelled out in §3.2. - OOC marker locked to ((double parens)), configurable. - All runtime data lives under <repo>/data/ (DB, backups, snapshots, exports, config). Tree is gitignored. CHAT_DB_PATH env var honored. CLAUDE.md and the requirements doc updated to match. Decisions log in the requirements doc appendix extended with the new locks (#17–21).
This commit is contained in:
@@ -1 +1,4 @@
|
||||
.DS_Store
|
||||
|
||||
# v1 runtime data (DB, backups, snapshots, exports, config with secrets)
|
||||
data/
|
||||
|
||||
@@ -23,9 +23,22 @@ The 3-entity cap is load-bearing: it makes the relationship graph fully enumerab
|
||||
## Architecture
|
||||
|
||||
- **Mac (always-on)**: web UI, orchestrator, persistence, event queue, retrieval, prompt construction, all state.
|
||||
- **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations (cloud API, rented GPU, local MLX/llama.cpp) behind one interface. The orchestrator never knows which.
|
||||
- **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations behind one interface. The orchestrator never knows which.
|
||||
- Streaming required for UX.
|
||||
|
||||
## Runtime stack (locked for v1)
|
||||
|
||||
- **Backend**: Python 3.11+ with **FastAPI**.
|
||||
- **Frontend**: server-rendered HTML + **HTMX** + minimal vanilla JS/CSS. No JS build chain.
|
||||
- **Live updates**: SSE per chat. Per-chat `asyncio.Queue` pub/sub. Multi-tab sync is a Phase 1 requirement — two browser tabs on the same chat must mirror each other live (streamed tokens, drawer state, edge updates).
|
||||
- **Inference backend**: **Featherless** (OpenAI-compatible API).
|
||||
- `narrative_model` = `dphn/Dolphin-Mistral-24B-Venice-Edition` (32K ctx, uncensored).
|
||||
- `classifier_model` = `NousResearch/Hermes-3-Llama-3.1-8B` (128K ctx, uncensored, structured-output reliable). Fallbacks: `cognitivecomputations/dolphin-2.9.4-llama3-8b` → `mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated`.
|
||||
- **Token budgets**: narrative 8K hard / 6K soft; classifier 4K hard. Trim tiers must / should / nice — never trim must-include.
|
||||
- **OOC marker**: `((double parens))` (configurable).
|
||||
- **Data layout**: everything under `<repo>/data/` — `chat.db`, `backups/`, `snapshots/`, `exports/`, `config.toml`. The whole tree is `.gitignore`d. `CHAT_DB_PATH` env var honored as override.
|
||||
- **Auth**: bind to `127.0.0.1` only in v1. No auth.
|
||||
|
||||
## Core concepts (vocabulary)
|
||||
|
||||
- **Entity**: `you | botA | botB`. Has identity (immutable), state (mood/goals/status), activity, per-POV memory.
|
||||
|
||||
@@ -23,8 +23,9 @@ The LLM is treated as a **renderer** for structured world state, not as the stat
|
||||
- **One chat per bot.** A second bot can be added as a *guest* into any chat. Hard cap: **2 bots in any scene**.
|
||||
- Explicit / mature content allowed.
|
||||
- **Featherless** as the LLM backend over its OpenAI-compatible API. Two model slots:
|
||||
- `narrative_model` — Dolphin-Mistral-24B-Venice (uncensored, narrative-grade).
|
||||
- `classifier_model` — small (~3B-class), TBD at Phase 1 start. Used for parsing, significance, interjection, scene-close detection, state-update passes.
|
||||
- `narrative_model` — `dphn/Dolphin-Mistral-24B-Venice-Edition` (uncensored, narrative-grade). 32K context.
|
||||
- `classifier_model` — `NousResearch/Hermes-3-Llama-3.1-8B` (uncensored, tuned for tool use / structured output). 128K context. Fallback chain if it underperforms on JSON: `cognitivecomputations/dolphin-2.9.4-llama3-8b` → `mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated`.
|
||||
- Classifier is used for: turn parsing (dialogue/action/ooc), kickoff prose parsing, scene-close detection, interjection decisions, significance scoring, state-update extraction, jump-skip memory synthesis.
|
||||
|
||||
### 2.2 Out of scope (v1)
|
||||
|
||||
@@ -57,6 +58,41 @@ The orchestrator never knows which model is in use — only `generate(prompt, pa
|
||||
|
||||
API key handling: keys live in a local config file outside the repository. **Never** commit a key to the repo, paste in chat logs, or include in exports.
|
||||
|
||||
### 3.1 Runtime stack
|
||||
|
||||
- **Backend**: Python 3.11+ with **FastAPI** as the HTTP server.
|
||||
- **Frontend**: server-rendered HTML + **HTMX** + minimal vanilla JS/CSS. No JS build chain.
|
||||
- **Live updates**: Server-Sent Events (SSE) per chat. Server keeps a per-chat in-process pub/sub channel (an `asyncio.Queue` per chat_id). Every browser tab on `/chats/<id>` opens an SSE connection to `/chats/<id>/events`. State changes (new turn, streamed tokens, drawer state, edge updates, scene close) publish to the channel; all subscribed tabs receive the event and HTMX swaps the relevant DOM region.
|
||||
- **Multi-tab sync** is a Phase 1 requirement, not a polish item. Two browser tabs open to the same chat must mirror each other in real time. Implications:
|
||||
- In-progress typing is tab-local until submit (no collaborative input in v1).
|
||||
- On reconnect/refresh, the server first sends a "current state" snapshot, then resumes streaming.
|
||||
- The same architecture trivially supports a phone or tablet on the LAN later — bind to `0.0.0.0` + add a shared-secret token if/when desired. Default is `127.0.0.1`, no auth.
|
||||
|
||||
### 3.2 Token budgets and trimming tiers
|
||||
|
||||
Token accounting via `tiktoken` with the closest cl100k approximation. Mistral and Llama tokenizers diverge ~5%; we accept the drift.
|
||||
|
||||
- **Narrative prompt**: 8K hard ceiling, 6K soft target. Leaves ~2-4K headroom for streamed output and avoids long-context performance cliffs. Plenty for our prompt shape.
|
||||
- **Classifier prompt**: 4K hard ceiling. Most calls are well under 1K.
|
||||
|
||||
When the assembled prompt exceeds the soft target, trim in this order — never trim must-include:
|
||||
|
||||
- **MUST-include** (always present):
|
||||
- System message + speaker identity
|
||||
- Speaker's edge to the addressee
|
||||
- Activity snapshot for all present entities
|
||||
- Current scene description
|
||||
- Last 4 turns of dialogue
|
||||
- **SHOULD-include** (trim when over budget):
|
||||
- Other edges of the speaker (e.g. speaker → other present)
|
||||
- Group node summary (when applicable)
|
||||
- Active threads
|
||||
- Currently active events + props
|
||||
- **NICE-include** (trim first):
|
||||
- Retrieved memories beyond top-2 (drop K=4 to K=2)
|
||||
- Dialogue turns beyond the last 4 (replace older turns with a one-line summary)
|
||||
- Per-POV summary of the previous scene
|
||||
|
||||
## 4. Data Model (top-level entities)
|
||||
|
||||
- **Bot** — top-level persistent unit. Has identity (immutable per session), state (mood/goals/status), per-bot clock, kickoff spec.
|
||||
@@ -113,7 +149,7 @@ A turn is free-form prose with conventional markers:
|
||||
|
||||
- `*walks over*` — action.
|
||||
- Quoted or bare text — dialogue.
|
||||
- `((double parens))` — out-of-character commentary or meta-instruction. Flagged but not sent to the bot. (Default; configurable before play begins.)
|
||||
- `((double parens))` — out-of-character commentary or meta-instruction. Flagged but not sent to the bot. (Default; stored as a config field; the user may change it before play begins.)
|
||||
|
||||
A small classifier call splits the turn into segments tagged `dialogue | action | ooc`. Action segments update the user's activity record.
|
||||
|
||||
@@ -236,9 +272,16 @@ Phase 1 has no skips and no events. Time is set at kickoff and stays put unless
|
||||
## 12. Persistence & Ops (v1 defaults)
|
||||
|
||||
- SQLite WAL mode, foreign keys on, transactional turns.
|
||||
- Single DB file. Default path TBD (likely `~/Library/Application Support/chat/chat.db`).
|
||||
- **Project-folder layout** (DB lives inside the repo, gitignored):
|
||||
- DB: `<repo>/data/chat.db`
|
||||
- Backups: `<repo>/data/backups/` (timestamped copies)
|
||||
- Pre-rewind snapshots: `<repo>/data/snapshots/`
|
||||
- Significant-scene JSON exports: `<repo>/data/exports/`
|
||||
- Config: `<repo>/data/config.toml` (holds Featherless API key, model names, OOC marker, K, budget, etc. Gitignored.)
|
||||
- The entire `data/` tree is in `.gitignore` so secrets and state never get committed.
|
||||
- `CHAT_DB_PATH` env var honored as an override if you want to point at a different file (e.g., a backup or a sibling repo's data).
|
||||
- **Auto-backup** nightly via launchd. Timestamped copies. Last 14 retained. Pre-rewind snapshots are separate and not pruned.
|
||||
- **Significant-scene JSON exports** written to a sibling folder when scenes close at significance ≥ 2.
|
||||
- **Significant-scene JSON exports** written to `data/exports/` when scenes close at significance ≥ 2.
|
||||
- Schema versioned in a `meta` table; migrations applied on startup.
|
||||
|
||||
## 13. Phase Cut
|
||||
@@ -290,13 +333,19 @@ Phase 1 has no skips and no events. Time is set at kickoff and stays put unless
|
||||
|
||||
## 14. Open / Deferred Decisions
|
||||
|
||||
- Exact small classifier model name on Featherless (pick at start of Phase 1: cheapest model that's good enough at structured-output classification).
|
||||
- Token budget tier strategy (must-include / should-include / nice-to-include) — designed against real prompts during Phase 1.
|
||||
- UI framework — TBD; local web app is the default direction.
|
||||
- OOC marker (`((parens))` proposed as default; user may change before play begins).
|
||||
- DB file location.
|
||||
- Embedding model choice (Phase 4).
|
||||
- sqlite-vss vs sqlite-vec (Phase 4).
|
||||
Resolved by this brainstorm (now reflected in §3 / §6 / §12 above):
|
||||
- ~~Classifier model name~~ → `NousResearch/Hermes-3-Llama-3.1-8B`, with documented fallback chain.
|
||||
- ~~Token budget tier strategy~~ → §3.2 (8K / 6K narrative, 4K classifier; must / should / nice tiers).
|
||||
- ~~UI framework~~ → FastAPI + HTMX + SSE, multi-tab sync as a Phase 1 requirement (§3.1).
|
||||
- ~~OOC marker~~ → `((double parens))`, configurable.
|
||||
- ~~DB file location~~ → project-folder `<repo>/data/` tree (§12).
|
||||
|
||||
Still deferred:
|
||||
- **Embedding model** (Phase 4 — pick whatever's cheap and good enough on Featherless or local at the time).
|
||||
- **sqlite-vss vs sqlite-vec** (Phase 4 — pick based on the projects' state at the time).
|
||||
- **Significance scoring rubric** — what does 0/1/2/3 mean? Drafted during Phase 1 against real scenes.
|
||||
- **Activity-record action verbs** — open vocabulary or constrained list? Decided during Phase 1 implementation.
|
||||
- **Drawer edit-affordance UX** — which fields editable in v1, which slip to Phase 1.5 / Phase 4.
|
||||
|
||||
## 15. Non-Negotiables (rules every implementer must respect)
|
||||
|
||||
@@ -331,3 +380,8 @@ Phase 1 has no skips and no events. Time is set at kickoff and stays put unless
|
||||
| 14 | Model strategy | Small classifier model + large narrative model |
|
||||
| 15 | Reset | Full wipe + hard confirm; chat sits ready for kickoff |
|
||||
| 16 | Rollback | Rewind + regenerate (with edit-then-regenerate) |
|
||||
| 17 | UI framework | FastAPI + HTMX + SSE; multi-tab sync as a Phase 1 requirement |
|
||||
| 18 | Classifier model | `NousResearch/Hermes-3-Llama-3.1-8B` (fallbacks: `dolphin-2.9.4-llama3-8b`, `Meta-Llama-3.1-8B-Instruct-abliterated`) |
|
||||
| 19 | Token budgets | Narrative 8K hard / 6K soft; classifier 4K hard. Must/Should/Nice tiers per §3.2 |
|
||||
| 20 | OOC marker | `((double parens))`, configurable |
|
||||
| 21 | DB location | Project-folder `<repo>/data/` tree (DB, backups, snapshots, exports, config). Gitignored. `CHAT_DB_PATH` env var honored |
|
||||
|
||||
Reference in New Issue
Block a user