Commit Graph

5 Commits

Author SHA1 Message Date
Joseph Doherty fe9c497038 feat: split classifier + embeddings to local mlx-omni-server, narrative stays on Featherless
Adds RoutedLLMClient that dispatches by model name: requests matching
Settings.narrative_model go to Featherless, everything else (classifier
calls, embed) goes to a local MLX server. The local server is
mlx-omni-server (separate venv at .mlx-venv) and exposes the standard
OpenAI surface at http://127.0.0.1:10240/v1.

LocalMLXClient mirrors FeatherlessClient (AsyncOpenAI under the hood)
but with a working embed() — Featherless's /v1/embeddings always
returns 500 with completions_error, so the router unconditionally
sends embed traffic to the local backend.

Production deployment overrides via data/config.toml:
- classifier_model = mlx-community/Hermes-3-Llama-3.1-8B-8bit (~8 GB)
- embedding_model = mlx-community/bge-small-en-v1.5-bf16 (~150 MB,
  384 dim — matches existing schema, no migration)

Defaults stay remote / pseudo so fresh installs and tests need no
external infra. Smoke-tested live: classifier returns expected output,
BGE produces correctly-clustering 384-dim vectors (cat-on-mat closer
to cat-on-rug than to quantum-mechanics).

scripts/start_mlx_server.sh starts the daemon (foreground or --daemon).
.mlx-venv/ added to .gitignore.

Suite: 464 passed (was 457 → +7 new across LocalMLXClient + Router).
2026-04-27 12:05:41 -04:00
Joseph Doherty 365dacc0d0 chore: post-Phase-1 cleanup — gitignore, packaging, backlog
- .gitignore: add *.egg-info/ so editable installs don't show in git status.
- pyproject.toml: add [build-system] and [tool.setuptools.packages.find]
  scoped to chat*, fixing pip install -e . which was failing on data/
  auto-discovery.
- CLAUDE.md: add Phase 1.5 cleanup backlog section under Phase 1 status,
  capturing the small follow-ups surfaced in implementer reviews
  (open_db refactor, regenerate SSE broadcast, you-activity purge,
  drawer edits for deferred fields, NICE trim order).
2026-04-26 14:39:10 -04:00
Joseph Doherty 4a60171035 feat: project skeleton with health endpoint 2026-04-26 11:23:38 -04:00
Joseph Doherty 5869f1c5ce docs: lock remaining v1 design decisions
Resolves the open/deferred decisions from the v1 requirements brainstorm:
runtime stack, classifier model, token budgets, OOC marker, data layout.

- Runtime: FastAPI + HTMX + SSE (multi-tab sync is a Phase 1 requirement,
  not a polish item). 127.0.0.1 only, no auth in v1.
- Classifier model: NousResearch/Hermes-3-Llama-3.1-8B with documented
  fallback chain (dolphin-2.9.4-llama3-8b, Meta-Llama-3.1-8B-abliterated).
- Token budgets: 8K hard / 6K soft for narrative, 4K hard for classifier;
  Must/Should/Nice trimming tiers spelled out in §3.2.
- OOC marker locked to ((double parens)), configurable.
- All runtime data lives under <repo>/data/ (DB, backups, snapshots,
  exports, config). Tree is gitignored. CHAT_DB_PATH env var honored.

CLAUDE.md and the requirements doc updated to match. Decisions log in
the requirements doc appendix extended with the new locks (#17–21).
2026-04-26 10:56:51 -04:00
Joseph Doherty 2f94ba7291 docs: add v1 requirements design + project README
- docs/plans/2026-04-26-v1-requirements-design.md captures the v1
  product requirements and behavioral spec from the initial brainstorm
  (use case, scope, data model, authoring, play loop, memory, time,
  rollback, phase cut, non-negotiable rules).
- README.md introduces the project for the gitea repo.
- CLAUDE.md updated to reference the requirements doc.
- .gitignore added for macOS metadata.
2026-04-26 10:46:03 -04:00