Experimental: Embeddings via Text Embedding Inference
We deploy Text Embedding Inference on top of Ray, so we can auto-scale and deploy whatever model you request. There are cold start times. Models are "kept hot" for 60 minutes after the last usage before being purged.
API reference: https://huggingface.github.io/text-embeddings-inference/
Background information: https://github.com/huggingface/text-embeddings-inference
Recommended model
My personal favorite open embeddings model (as of Apr 4, 2024) is nomic-ai/nomic-embed-text-v1.5
Usage
The base endpoint for HuggingFace embedding is https://api.ncsa.ai/llm/v1/embeddings
curl https://api.ncsa.ai/llm/v1/embeddings \
-X POST \
-H 'Content-Type: application/json' \
-d '{
"model": "nomic-ai/nomic-embed-text-v1.5",
"input": "What is Deep Learning?"
}'
π Python
from openai import OpenAI
client = OpenAI(api_key="empty",base_url="https://api.ncsa.ai/llm/v1/")
response = client.embeddings.create(
model="nomic-ai/nomic-embed-text-v1.5",
input="What is Deep Learning?"
)
print(response.data[0].embedding)
Last updated