docs: lock remaining v1 design decisions
Resolves the open/deferred decisions from the v1 requirements brainstorm: runtime stack, classifier model, token budgets, OOC marker, data layout. - Runtime: FastAPI + HTMX + SSE (multi-tab sync is a Phase 1 requirement, not a polish item). 127.0.0.1 only, no auth in v1. - Classifier model: NousResearch/Hermes-3-Llama-3.1-8B with documented fallback chain (dolphin-2.9.4-llama3-8b, Meta-Llama-3.1-8B-abliterated). - Token budgets: 8K hard / 6K soft for narrative, 4K hard for classifier; Must/Should/Nice trimming tiers spelled out in §3.2. - OOC marker locked to ((double parens)), configurable. - All runtime data lives under <repo>/data/ (DB, backups, snapshots, exports, config). Tree is gitignored. CHAT_DB_PATH env var honored. CLAUDE.md and the requirements doc updated to match. Decisions log in the requirements doc appendix extended with the new locks (#17–21).
This commit is contained in:
@@ -23,9 +23,22 @@ The 3-entity cap is load-bearing: it makes the relationship graph fully enumerab
|
||||
## Architecture
|
||||
|
||||
- **Mac (always-on)**: web UI, orchestrator, persistence, event queue, retrieval, prompt construction, all state.
|
||||
- **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations (cloud API, rented GPU, local MLX/llama.cpp) behind one interface. The orchestrator never knows which.
|
||||
- **Inference endpoint**: stateless `generate(prompt, params) -> text`. Swap implementations behind one interface. The orchestrator never knows which.
|
||||
- Streaming required for UX.
|
||||
|
||||
## Runtime stack (locked for v1)
|
||||
|
||||
- **Backend**: Python 3.11+ with **FastAPI**.
|
||||
- **Frontend**: server-rendered HTML + **HTMX** + minimal vanilla JS/CSS. No JS build chain.
|
||||
- **Live updates**: SSE per chat. Per-chat `asyncio.Queue` pub/sub. Multi-tab sync is a Phase 1 requirement — two browser tabs on the same chat must mirror each other live (streamed tokens, drawer state, edge updates).
|
||||
- **Inference backend**: **Featherless** (OpenAI-compatible API).
|
||||
- `narrative_model` = `dphn/Dolphin-Mistral-24B-Venice-Edition` (32K ctx, uncensored).
|
||||
- `classifier_model` = `NousResearch/Hermes-3-Llama-3.1-8B` (128K ctx, uncensored, structured-output reliable). Fallbacks: `cognitivecomputations/dolphin-2.9.4-llama3-8b` → `mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated`.
|
||||
- **Token budgets**: narrative 8K hard / 6K soft; classifier 4K hard. Trim tiers must / should / nice — never trim must-include.
|
||||
- **OOC marker**: `((double parens))` (configurable).
|
||||
- **Data layout**: everything under `<repo>/data/` — `chat.db`, `backups/`, `snapshots/`, `exports/`, `config.toml`. The whole tree is `.gitignore`d. `CHAT_DB_PATH` env var honored as override.
|
||||
- **Auth**: bind to `127.0.0.1` only in v1. No auth.
|
||||
|
||||
## Core concepts (vocabulary)
|
||||
|
||||
- **Entity**: `you | botA | botB`. Has identity (immutable), state (mood/goals/status), activity, per-POV memory.
|
||||
|
||||
Reference in New Issue
Block a user