dohertj2/chat - chat - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Joseph Doherty	fe9c497038	feat: split classifier + embeddings to local mlx-omni-server, narrative stays on Featherless Adds RoutedLLMClient that dispatches by model name: requests matching Settings.narrative_model go to Featherless, everything else (classifier calls, embed) goes to a local MLX server. The local server is mlx-omni-server (separate venv at .mlx-venv) and exposes the standard OpenAI surface at http://127.0.0.1:10240/v1. LocalMLXClient mirrors FeatherlessClient (AsyncOpenAI under the hood) but with a working embed() — Featherless's /v1/embeddings always returns 500 with completions_error, so the router unconditionally sends embed traffic to the local backend. Production deployment overrides via data/config.toml: - classifier_model = mlx-community/Hermes-3-Llama-3.1-8B-8bit (~8 GB) - embedding_model = mlx-community/bge-small-en-v1.5-bf16 (~150 MB, 384 dim — matches existing schema, no migration) Defaults stay remote / pseudo so fresh installs and tests need no external infra. Smoke-tested live: classifier returns expected output, BGE produces correctly-clustering 384-dim vectors (cat-on-mat closer to cat-on-rug than to quantum-mechanics). scripts/start_mlx_server.sh starts the daemon (foreground or --daemon). .mlx-venv/ added to .gitignore. Suite: 464 passed (was 457 → +7 new across LocalMLXClient + Router).	2026-04-27 12:05:41 -04:00
Joseph Doherty	365dacc0d0	chore: post-Phase-1 cleanup — gitignore, packaging, backlog - .gitignore: add .egg-info/ so editable installs don't show in git status. - pyproject.toml: add [build-system] and [tool.setuptools.packages.find] scoped to chat, fixing pip install -e . which was failing on data/ auto-discovery. - CLAUDE.md: add Phase 1.5 cleanup backlog section under Phase 1 status, capturing the small follow-ups surfaced in implementer reviews (open_db refactor, regenerate SSE broadcast, you-activity purge, drawer edits for deferred fields, NICE trim order).	2026-04-26 14:39:10 -04:00
Joseph Doherty	4a60171035	feat: project skeleton with health endpoint	2026-04-26 11:23:38 -04:00
Joseph Doherty	5869f1c5ce	docs: lock remaining v1 design decisions Resolves the open/deferred decisions from the v1 requirements brainstorm: runtime stack, classifier model, token budgets, OOC marker, data layout. - Runtime: FastAPI + HTMX + SSE (multi-tab sync is a Phase 1 requirement, not a polish item). 127.0.0.1 only, no auth in v1. - Classifier model: NousResearch/Hermes-3-Llama-3.1-8B with documented fallback chain (dolphin-2.9.4-llama3-8b, Meta-Llama-3.1-8B-abliterated). - Token budgets: 8K hard / 6K soft for narrative, 4K hard for classifier; Must/Should/Nice trimming tiers spelled out in §3.2. - OOC marker locked to ((double parens)), configurable. - All runtime data lives under <repo>/data/ (DB, backups, snapshots, exports, config). Tree is gitignored. CHAT_DB_PATH env var honored. CLAUDE.md and the requirements doc updated to match. Decisions log in the requirements doc appendix extended with the new locks (#17–21).	2026-04-26 10:56:51 -04:00
Joseph Doherty	2f94ba7291	docs: add v1 requirements design + project README - docs/plans/2026-04-26-v1-requirements-design.md captures the v1 product requirements and behavioral spec from the initial brainstorm (use case, scope, data model, authoring, play loop, memory, time, rollback, phase cut, non-negotiable rules). - README.md introduces the project for the gitea repo. - CLAUDE.md updated to reference the requirements doc. - .gitignore added for macOS metadata.	2026-04-26 10:46:03 -04:00

5 Commits