fe9c497038
Adds RoutedLLMClient that dispatches by model name: requests matching Settings.narrative_model go to Featherless, everything else (classifier calls, embed) goes to a local MLX server. The local server is mlx-omni-server (separate venv at .mlx-venv) and exposes the standard OpenAI surface at http://127.0.0.1:10240/v1. LocalMLXClient mirrors FeatherlessClient (AsyncOpenAI under the hood) but with a working embed() — Featherless's /v1/embeddings always returns 500 with completions_error, so the router unconditionally sends embed traffic to the local backend. Production deployment overrides via data/config.toml: - classifier_model = mlx-community/Hermes-3-Llama-3.1-8B-8bit (~8 GB) - embedding_model = mlx-community/bge-small-en-v1.5-bf16 (~150 MB, 384 dim — matches existing schema, no migration) Defaults stay remote / pseudo so fresh installs and tests need no external infra. Smoke-tested live: classifier returns expected output, BGE produces correctly-clustering 384-dim vectors (cat-on-mat closer to cat-on-rug than to quantum-mechanics). scripts/start_mlx_server.sh starts the daemon (foreground or --daemon). .mlx-venv/ added to .gitignore. Suite: 464 passed (was 457 → +7 new across LocalMLXClient + Router).
39 lines
1.3 KiB
Bash
Executable File
39 lines
1.3 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# Start the local mlx-omni-server that serves the classifier + embedding
|
|
# models. The chat app's RoutedLLMClient routes everything except the
|
|
# narrative model to this server; with no MLX server running, classifier
|
|
# calls fail and embeddings degrade to the zero-vector fallback.
|
|
#
|
|
# Run in the foreground:
|
|
# ./scripts/start_mlx_server.sh
|
|
# Run as a background daemon (logs to data/mlx-server.log):
|
|
# ./scripts/start_mlx_server.sh --daemon
|
|
#
|
|
# Models are pulled from Hugging Face on first request; expect a delay
|
|
# the first time you exercise the classifier or embedding path.
|
|
|
|
set -euo pipefail
|
|
|
|
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
|
VENV="${REPO_ROOT}/.mlx-venv"
|
|
LOG="${REPO_ROOT}/data/mlx-server.log"
|
|
PORT="${MLX_PORT:-10240}"
|
|
HOST="${MLX_HOST:-127.0.0.1}"
|
|
|
|
if [ ! -x "${VENV}/bin/mlx-omni-server" ]; then
|
|
echo "error: mlx-omni-server not installed in ${VENV}" >&2
|
|
echo "create the venv with:" >&2
|
|
echo " python3.12 -m venv ${VENV} && ${VENV}/bin/pip install mlx-omni-server" >&2
|
|
exit 1
|
|
fi
|
|
|
|
if [ "${1:-}" = "--daemon" ]; then
|
|
mkdir -p "$(dirname "${LOG}")"
|
|
nohup "${VENV}/bin/mlx-omni-server" --host "${HOST}" --port "${PORT}" \
|
|
>>"${LOG}" 2>&1 &
|
|
echo "mlx-omni-server started in background (pid $!)"
|
|
echo "logs: ${LOG}"
|
|
else
|
|
exec "${VENV}/bin/mlx-omni-server" --host "${HOST}" --port "${PORT}"
|
|
fi
|