Adds RoutedLLMClient that dispatches by model name: requests matching
Settings.narrative_model go to Featherless, everything else (classifier
calls, embed) goes to a local MLX server. The local server is
mlx-omni-server (separate venv at .mlx-venv) and exposes the standard
OpenAI surface at http://127.0.0.1:10240/v1.
LocalMLXClient mirrors FeatherlessClient (AsyncOpenAI under the hood)
but with a working embed() — Featherless's /v1/embeddings always
returns 500 with completions_error, so the router unconditionally
sends embed traffic to the local backend.
Production deployment overrides via data/config.toml:
- classifier_model = mlx-community/Hermes-3-Llama-3.1-8B-8bit (~8 GB)
- embedding_model = mlx-community/bge-small-en-v1.5-bf16 (~150 MB,
384 dim — matches existing schema, no migration)
Defaults stay remote / pseudo so fresh installs and tests need no
external infra. Smoke-tested live: classifier returns expected output,
BGE produces correctly-clustering 384-dim vectors (cat-on-mat closer
to cat-on-rug than to quantum-mechanics).
scripts/start_mlx_server.sh starts the daemon (foreground or --daemon).
.mlx-venv/ added to .gitignore.
Suite: 464 passed (was 457 → +7 new across LocalMLXClient + Router).
Adds two new flags to the backfill script:
* --re-embed-all walks **every** memory (not just those without
an existing embeddings row) and re-emits embedding_indexed
events. The projector is INSERT OR REPLACE, so re-emitting an event
for an existing memory replaces the prior vector. Use this when
swapping embedding models — the default mode still keeps the Phase
4 gap-fill behavior.
* --model M overrides Settings.embedding_model for this run.
The script also gains a small _build_client helper that returns
None for the pseudo path (no client needed) and a FeatherlessClient
otherwise; tests monkeypatch this to inject a Mock with canned
embeddings.
Adds tests/test_backfill_embeddings.py with three integration
tests: re-embed-all walks every memory, default mode skips existing
rows, and --model overrides the configured model end-to-end.
Idempotent seeder for three sample bots (Maya — coworker slow-burn,
Eli — live-in partner, Sam — bartender / new connection). Each is a
distinct relational archetype to exercise the system from different
angles. Run from repo root:
.venv/bin/python scripts/seed_sample_bots.py
Re-running skips ids that already exist. After seeding, walk each bot
through kickoff parse-and-confirm at /bots/<id>/kickoff.