docs: lock remaining v1 design decisions

Resolves the open/deferred decisions from the v1 requirements brainstorm: runtime stack, classifier model, token budgets, OOC marker, data layout. - Runtime: FastAPI + HTMX + SSE (multi-tab sync is a Phase 1 requirement, not a polish item). 127.0.0.1 only, no auth in v1. - Classifier model: NousResearch/Hermes-3-Llama-3.1-8B with documented fallback chain (dolphin-2.9.4-llama3-8b, Meta-Llama-3.1-8B-abliterated). - Token budgets: 8K hard / 6K soft for narrative, 4K hard for classifier; Must/Should/Nice trimming tiers spelled out in §3.2. - OOC marker locked to ((double parens)), configurable. - All runtime data lives under <repo>/data/ (DB, backups, snapshots, exports, config). Tree is gitignored. CHAT_DB_PATH env var honored. CLAUDE.md and the requirements doc updated to match. Decisions log in the requirements doc appendix extended with the new locks (#17–21).
2026-04-26 10:56:51 -04:00
parent 2f94ba7291
commit 5869f1c5ce
3 changed files with 83 additions and 13 deletions
@@ -23,9 +23,22 @@ The 3-entity cap is load-bearing: it makes the relationship graph fully enumerab
 ## Architecture

 - **Mac (always-on)**: web UI, orchestrator, persistence, event queue, retrieval, prompt construction, all state.
- **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations (cloud API, rented GPU, local MLX/llama.cpp) behind one interface. The orchestrator never knows which.
+- **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations behind one interface. The orchestrator never knows which.
 - Streaming required for UX.

+## Runtime stack (locked for v1)
+
+- **Backend**: Python 3.11+ with **FastAPI**.
+- **Frontend**: server-rendered HTML + **HTMX** + minimal vanilla JS/CSS. No JS build chain.
+- **Live updates**: SSE per chat. Per-chat `asyncio.Queue` pub/sub. Multi-tab sync is a Phase 1 requirement — two browser tabs on the same chat must mirror each other live (streamed tokens, drawer state, edge updates).
+- **Inference backend**: **Featherless** (OpenAI-compatible API).
+  - `narrative_model` = `dphn/Dolphin-Mistral-24B-Venice-Edition` (32K ctx, uncensored).
+  - `classifier_model` = `NousResearch/Hermes-3-Llama-3.1-8B` (128K ctx, uncensored, structured-output reliable). Fallbacks: `cognitivecomputations/dolphin-2.9.4-llama3-8b` → `mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated`.
+- **Token budgets**: narrative 8K hard / 6K soft; classifier 4K hard. Trim tiers must / should / nice — never trim must-include.
+- **OOC marker**: `((double parens))` (configurable).
+- **Data layout**: everything under `<repo>/data/` — `chat.db`, `backups/`, `snapshots/`, `exports/`, `config.toml`. The whole tree is `.gitignore`d. `CHAT_DB_PATH` env var honored as override.
+- **Auth**: bind to `127.0.0.1` only in v1. No auth.
+
 ## Core concepts (vocabulary)

 - **Entity**: `you | botA | botB`. Has identity (immutable), state (mood/goals/status), activity, per-POV memory.