docs: clarify FeatherlessClient.embed() rationale (verified 500 + empty embedding catalog)
Updates the docstring + test docstring for the NotImplementedError stub shipped in T112 (Phase 4.5). Original wording said Featherless 'does not expose /v1/embeddings'; verified the endpoint actually responds but always returns HTTP 500 with type='completions_error' for every model tried (text-embedding-3-small, BAAI/bge-small-en-v1.5, sentence-transformers/all-MiniLM-L6-v2, etc.) and /v1/models has no embedding-class entries. Stub behavior unchanged.
This commit is contained in:
+19
-11
@@ -55,24 +55,32 @@ class FeatherlessClient:
|
|||||||
yield delta
|
yield delta
|
||||||
|
|
||||||
async def embed(self, text: str, *, model: str) -> list[float]:
|
async def embed(self, text: str, *, model: str) -> list[float]:
|
||||||
"""Embeddings via Featherless — currently unsupported.
|
"""Embeddings via Featherless — unsupported in practice.
|
||||||
|
|
||||||
T112 (Phase 4.5) extends the LLMClient Protocol with ``embed()``
|
T112 (Phase 4.5) extends the LLMClient Protocol with ``embed()``
|
||||||
for a future real-embedding swap. Featherless's OpenAI-compatible
|
for a future real-embedding swap. Featherless's OpenAI-compatible
|
||||||
surface does NOT expose ``/v1/embeddings`` at the time of writing,
|
surface routes ``/v1/embeddings`` (no 404), but every request
|
||||||
so this implementation raises ``NotImplementedError`` rather than
|
returns HTTP 500 ``{"error": {"type": "completions_error", ...}}``
|
||||||
attempting a request that would 404. The
|
— including standard names like ``text-embedding-3-small`` and
|
||||||
|
``BAAI/bge-small-en-v1.5``. ``/v1/models`` confirms it: the
|
||||||
|
catalog has no embedding-class entries, only chat/completion
|
||||||
|
classes (``llama3-*``, ``gemma3-*``, ``glm5-*``, etc.).
|
||||||
|
|
||||||
|
Rather than ship a request that always 500s, this implementation
|
||||||
|
raises ``NotImplementedError``. The
|
||||||
:func:`chat.services.embeddings.generate_embedding` wrapper
|
:func:`chat.services.embeddings.generate_embedding` wrapper
|
||||||
catches this and degrades to the existing zero-vector fallback
|
catches it and degrades to the existing zero-vector fallback
|
||||||
(with the T107 warning), so misconfigured callers fail loudly in
|
(with the T107 warning), so misconfigured callers fail loudly in
|
||||||
logs but the request path keeps working.
|
logs but the request path keeps working.
|
||||||
|
|
||||||
If Featherless ships embeddings, swap the body for an
|
For real embeddings, configure a different provider (OpenAI
|
||||||
``self._client.embeddings.create(model=..., input=...)`` call
|
direct, Cohere, Voyage, Together, self-hosted Ollama /
|
||||||
guarded by ``self._sem()`` (mirrors ``generate``/``stream``).
|
sentence-transformers). The Mock + routing seam from T112 keeps
|
||||||
|
the swap to a one-class change in ``chat/llm/``.
|
||||||
"""
|
"""
|
||||||
raise NotImplementedError(
|
raise NotImplementedError(
|
||||||
"Featherless does not expose /v1/embeddings; "
|
"Featherless /v1/embeddings always returns 500 "
|
||||||
"configure a different embedding provider or stick with "
|
'("completions_error") and the model catalog has no '
|
||||||
"the default pseudo-sha256-384 model."
|
"embedding class; configure a different embedding provider "
|
||||||
|
"or stick with the default pseudo-sha256-384 model."
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -1,10 +1,12 @@
|
|||||||
"""Tests for FeatherlessClient (Phase 4.5+).
|
"""Tests for FeatherlessClient (Phase 4.5+).
|
||||||
|
|
||||||
Phase 4.5 adds an ``embed()`` method to the LLMClient Protocol (T112).
|
Phase 4.5 adds an ``embed()`` method to the LLMClient Protocol (T112).
|
||||||
Featherless does not expose an OpenAI-compatible ``/v1/embeddings``
|
Featherless's OpenAI-compatible surface routes ``/v1/embeddings`` but
|
||||||
endpoint, so its implementation deliberately raises
|
every request returns HTTP 500 ``{"type": "completions_error"}`` (the
|
||||||
``NotImplementedError`` to surface the gap clearly. The
|
router accepts the URL but the backend has no embedding handler), and
|
||||||
``generate_embedding`` wrapper catches this and degrades to the
|
``/v1/models`` lists no embedding-class models. The implementation
|
||||||
|
raises ``NotImplementedError`` rather than ship a request that always
|
||||||
|
errors; ``generate_embedding`` catches it and degrades to the
|
||||||
zero-vector fallback (the existing T107 warning path).
|
zero-vector fallback (the existing T107 warning path).
|
||||||
|
|
||||||
If/when Featherless ships embeddings, swap the body for a real call to
|
If/when Featherless ships embeddings, swap the body for a real call to
|
||||||
@@ -20,10 +22,11 @@ from chat.llm.featherless import FeatherlessClient
|
|||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
async def test_featherless_embed_raises_not_implemented():
|
async def test_featherless_embed_raises_not_implemented():
|
||||||
"""Featherless does not expose ``/v1/embeddings`` — embed() must
|
"""Featherless's ``/v1/embeddings`` always 500s with
|
||||||
raise ``NotImplementedError`` so callers (``generate_embedding``)
|
``"completions_error"`` and its model catalog has no embedding
|
||||||
can degrade to the fallback zero vector + warning rather than
|
class — embed() must raise ``NotImplementedError`` so callers
|
||||||
silently producing useless output."""
|
(``generate_embedding``) can degrade to the fallback zero vector
|
||||||
|
+ warning rather than silently producing useless output."""
|
||||||
client = FeatherlessClient(api_key="test-key")
|
client = FeatherlessClient(api_key="test-key")
|
||||||
with pytest.raises(NotImplementedError) as excinfo:
|
with pytest.raises(NotImplementedError) as excinfo:
|
||||||
await client.embed("hello world", model="bge-small-en-v1.5")
|
await client.embed("hello world", model="bge-small-en-v1.5")
|
||||||
|
|||||||
Reference in New Issue
Block a user