docs: clarify FeatherlessClient.embed() rationale (verified 500 + empty embedding catalog)

Updates the docstring + test docstring for the NotImplementedError stub
shipped in T112 (Phase 4.5). Original wording said Featherless 'does
not expose /v1/embeddings'; verified the endpoint actually responds
but always returns HTTP 500 with type='completions_error' for every
model tried (text-embedding-3-small, BAAI/bge-small-en-v1.5,
sentence-transformers/all-MiniLM-L6-v2, etc.) and /v1/models has no
embedding-class entries. Stub behavior unchanged.
This commit is contained in:
Joseph Doherty
2026-04-27 11:39:53 -04:00
parent a03f664407
commit b3d78c1603
2 changed files with 30 additions and 19 deletions
+19 -11
View File
@@ -55,24 +55,32 @@ class FeatherlessClient:
yield delta
async def embed(self, text: str, *, model: str) -> list[float]:
"""Embeddings via Featherless — currently unsupported.
"""Embeddings via Featherless — unsupported in practice.
T112 (Phase 4.5) extends the LLMClient Protocol with ``embed()``
for a future real-embedding swap. Featherless's OpenAI-compatible
surface does NOT expose ``/v1/embeddings`` at the time of writing,
so this implementation raises ``NotImplementedError`` rather than
attempting a request that would 404. The
surface routes ``/v1/embeddings`` (no 404), but every request
returns HTTP 500 ``{"error": {"type": "completions_error", ...}}``
— including standard names like ``text-embedding-3-small`` and
``BAAI/bge-small-en-v1.5``. ``/v1/models`` confirms it: the
catalog has no embedding-class entries, only chat/completion
classes (``llama3-*``, ``gemma3-*``, ``glm5-*``, etc.).
Rather than ship a request that always 500s, this implementation
raises ``NotImplementedError``. The
:func:`chat.services.embeddings.generate_embedding` wrapper
catches this and degrades to the existing zero-vector fallback
catches it and degrades to the existing zero-vector fallback
(with the T107 warning), so misconfigured callers fail loudly in
logs but the request path keeps working.
If Featherless ships embeddings, swap the body for an
``self._client.embeddings.create(model=..., input=...)`` call
guarded by ``self._sem()`` (mirrors ``generate``/``stream``).
For real embeddings, configure a different provider (OpenAI
direct, Cohere, Voyage, Together, self-hosted Ollama /
sentence-transformers). The Mock + routing seam from T112 keeps
the swap to a one-class change in ``chat/llm/``.
"""
raise NotImplementedError(
"Featherless does not expose /v1/embeddings; "
"configure a different embedding provider or stick with "
"the default pseudo-sha256-384 model."
"Featherless /v1/embeddings always returns 500 "
'("completions_error") and the model catalog has no '
"embedding class; configure a different embedding provider "
"or stick with the default pseudo-sha256-384 model."
)