docs: clarify FeatherlessClient.embed() rationale (verified 500 + empty embedding catalog)

Updates the docstring + test docstring for the NotImplementedError stub
shipped in T112 (Phase 4.5). Original wording said Featherless 'does
not expose /v1/embeddings'; verified the endpoint actually responds
but always returns HTTP 500 with type='completions_error' for every
model tried (text-embedding-3-small, BAAI/bge-small-en-v1.5,
sentence-transformers/all-MiniLM-L6-v2, etc.) and /v1/models has no
embedding-class entries. Stub behavior unchanged.
This commit is contained in:
Joseph Doherty
2026-04-27 11:39:53 -04:00
parent a03f664407
commit b3d78c1603
2 changed files with 30 additions and 19 deletions
+19 -11
View File
@@ -55,24 +55,32 @@ class FeatherlessClient:
yield delta yield delta
async def embed(self, text: str, *, model: str) -> list[float]: async def embed(self, text: str, *, model: str) -> list[float]:
"""Embeddings via Featherless — currently unsupported. """Embeddings via Featherless — unsupported in practice.
T112 (Phase 4.5) extends the LLMClient Protocol with ``embed()`` T112 (Phase 4.5) extends the LLMClient Protocol with ``embed()``
for a future real-embedding swap. Featherless's OpenAI-compatible for a future real-embedding swap. Featherless's OpenAI-compatible
surface does NOT expose ``/v1/embeddings`` at the time of writing, surface routes ``/v1/embeddings`` (no 404), but every request
so this implementation raises ``NotImplementedError`` rather than returns HTTP 500 ``{"error": {"type": "completions_error", ...}}``
attempting a request that would 404. The — including standard names like ``text-embedding-3-small`` and
``BAAI/bge-small-en-v1.5``. ``/v1/models`` confirms it: the
catalog has no embedding-class entries, only chat/completion
classes (``llama3-*``, ``gemma3-*``, ``glm5-*``, etc.).
Rather than ship a request that always 500s, this implementation
raises ``NotImplementedError``. The
:func:`chat.services.embeddings.generate_embedding` wrapper :func:`chat.services.embeddings.generate_embedding` wrapper
catches this and degrades to the existing zero-vector fallback catches it and degrades to the existing zero-vector fallback
(with the T107 warning), so misconfigured callers fail loudly in (with the T107 warning), so misconfigured callers fail loudly in
logs but the request path keeps working. logs but the request path keeps working.
If Featherless ships embeddings, swap the body for an For real embeddings, configure a different provider (OpenAI
``self._client.embeddings.create(model=..., input=...)`` call direct, Cohere, Voyage, Together, self-hosted Ollama /
guarded by ``self._sem()`` (mirrors ``generate``/``stream``). sentence-transformers). The Mock + routing seam from T112 keeps
the swap to a one-class change in ``chat/llm/``.
""" """
raise NotImplementedError( raise NotImplementedError(
"Featherless does not expose /v1/embeddings; " "Featherless /v1/embeddings always returns 500 "
"configure a different embedding provider or stick with " '("completions_error") and the model catalog has no '
"the default pseudo-sha256-384 model." "embedding class; configure a different embedding provider "
"or stick with the default pseudo-sha256-384 model."
) )
+11 -8
View File
@@ -1,10 +1,12 @@
"""Tests for FeatherlessClient (Phase 4.5+). """Tests for FeatherlessClient (Phase 4.5+).
Phase 4.5 adds an ``embed()`` method to the LLMClient Protocol (T112). Phase 4.5 adds an ``embed()`` method to the LLMClient Protocol (T112).
Featherless does not expose an OpenAI-compatible ``/v1/embeddings`` Featherless's OpenAI-compatible surface routes ``/v1/embeddings`` but
endpoint, so its implementation deliberately raises every request returns HTTP 500 ``{"type": "completions_error"}`` (the
``NotImplementedError`` to surface the gap clearly. The router accepts the URL but the backend has no embedding handler), and
``generate_embedding`` wrapper catches this and degrades to the ``/v1/models`` lists no embedding-class models. The implementation
raises ``NotImplementedError`` rather than ship a request that always
errors; ``generate_embedding`` catches it and degrades to the
zero-vector fallback (the existing T107 warning path). zero-vector fallback (the existing T107 warning path).
If/when Featherless ships embeddings, swap the body for a real call to If/when Featherless ships embeddings, swap the body for a real call to
@@ -20,10 +22,11 @@ from chat.llm.featherless import FeatherlessClient
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_featherless_embed_raises_not_implemented(): async def test_featherless_embed_raises_not_implemented():
"""Featherless does not expose ``/v1/embeddings`` — embed() must """Featherless's ``/v1/embeddings`` always 500s with
raise ``NotImplementedError`` so callers (``generate_embedding``) ``"completions_error"`` and its model catalog has no embedding
can degrade to the fallback zero vector + warning rather than class — embed() must raise ``NotImplementedError`` so callers
silently producing useless output.""" (``generate_embedding``) can degrade to the fallback zero vector
+ warning rather than silently producing useless output."""
client = FeatherlessClient(api_key="test-key") client = FeatherlessClient(api_key="test-key")
with pytest.raises(NotImplementedError) as excinfo: with pytest.raises(NotImplementedError) as excinfo:
await client.embed("hello world", model="bge-small-en-v1.5") await client.embed("hello world", model="bge-small-en-v1.5")