Files
chat/scripts/start_mlx_server.sh
T
Joseph Doherty fe9c497038 feat: split classifier + embeddings to local mlx-omni-server, narrative stays on Featherless
Adds RoutedLLMClient that dispatches by model name: requests matching
Settings.narrative_model go to Featherless, everything else (classifier
calls, embed) goes to a local MLX server. The local server is
mlx-omni-server (separate venv at .mlx-venv) and exposes the standard
OpenAI surface at http://127.0.0.1:10240/v1.

LocalMLXClient mirrors FeatherlessClient (AsyncOpenAI under the hood)
but with a working embed() — Featherless's /v1/embeddings always
returns 500 with completions_error, so the router unconditionally
sends embed traffic to the local backend.

Production deployment overrides via data/config.toml:
- classifier_model = mlx-community/Hermes-3-Llama-3.1-8B-8bit (~8 GB)
- embedding_model = mlx-community/bge-small-en-v1.5-bf16 (~150 MB,
  384 dim — matches existing schema, no migration)

Defaults stay remote / pseudo so fresh installs and tests need no
external infra. Smoke-tested live: classifier returns expected output,
BGE produces correctly-clustering 384-dim vectors (cat-on-mat closer
to cat-on-rug than to quantum-mechanics).

scripts/start_mlx_server.sh starts the daemon (foreground or --daemon).
.mlx-venv/ added to .gitignore.

Suite: 464 passed (was 457 → +7 new across LocalMLXClient + Router).
2026-04-27 12:05:41 -04:00

39 lines
1.3 KiB
Bash
Executable File

#!/usr/bin/env bash
# Start the local mlx-omni-server that serves the classifier + embedding
# models. The chat app's RoutedLLMClient routes everything except the
# narrative model to this server; with no MLX server running, classifier
# calls fail and embeddings degrade to the zero-vector fallback.
#
# Run in the foreground:
# ./scripts/start_mlx_server.sh
# Run as a background daemon (logs to data/mlx-server.log):
# ./scripts/start_mlx_server.sh --daemon
#
# Models are pulled from Hugging Face on first request; expect a delay
# the first time you exercise the classifier or embedding path.
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
VENV="${REPO_ROOT}/.mlx-venv"
LOG="${REPO_ROOT}/data/mlx-server.log"
PORT="${MLX_PORT:-10240}"
HOST="${MLX_HOST:-127.0.0.1}"
if [ ! -x "${VENV}/bin/mlx-omni-server" ]; then
echo "error: mlx-omni-server not installed in ${VENV}" >&2
echo "create the venv with:" >&2
echo " python3.12 -m venv ${VENV} && ${VENV}/bin/pip install mlx-omni-server" >&2
exit 1
fi
if [ "${1:-}" = "--daemon" ]; then
mkdir -p "$(dirname "${LOG}")"
nohup "${VENV}/bin/mlx-omni-server" --host "${HOST}" --port "${PORT}" \
>>"${LOG}" 2>&1 &
echo "mlx-omni-server started in background (pid $!)"
echo "logs: ${LOG}"
else
exec "${VENV}/bin/mlx-omni-server" --host "${HOST}" --port "${PORT}"
fi