chat/scripts at f775eb7e9278fe8b53da1cc6b8dfdf98152bb6d1 - chat

Files

T

Joseph Doherty fe9c497038 feat: split classifier + embeddings to local mlx-omni-server, narrative stays on Featherless

Adds RoutedLLMClient that dispatches by model name: requests matching
Settings.narrative_model go to Featherless, everything else (classifier
calls, embed) goes to a local MLX server. The local server is
mlx-omni-server (separate venv at .mlx-venv) and exposes the standard
OpenAI surface at http://127.0.0.1:10240/v1.

LocalMLXClient mirrors FeatherlessClient (AsyncOpenAI under the hood)
but with a working embed() — Featherless's /v1/embeddings always
returns 500 with completions_error, so the router unconditionally
sends embed traffic to the local backend.

Production deployment overrides via data/config.toml:
- classifier_model = mlx-community/Hermes-3-Llama-3.1-8B-8bit (~8 GB)
- embedding_model = mlx-community/bge-small-en-v1.5-bf16 (~150 MB,
  384 dim — matches existing schema, no migration)

Defaults stay remote / pseudo so fresh installs and tests need no
external infra. Smoke-tested live: classifier returns expected output,
BGE produces correctly-clustering 384-dim vectors (cat-on-mat closer
to cat-on-rug than to quantum-mechanics).

scripts/start_mlx_server.sh starts the daemon (foreground or --daemon).
.mlx-venv/ added to .gitignore.

Suite: 464 passed (was 457 → +7 new across LocalMLXClient + Router).

2026-04-27 12:05:41 -04:00

backfill_embeddings.py

feat: backfill_embeddings --re-embed-all flag for model swaps (T112.4)

2026-04-27 06:02:23 -04:00

seed_sample_bots.py

chore: add scripts/seed_sample_bots.py

2026-04-26 14:50:06 -04:00

start_mlx_server.sh

feat: split classifier + embeddings to local mlx-omni-server, narrative stays on Featherless

2026-04-27 12:05:41 -04:00