Experimental: Embeddings via Text Embedding Inference

We deploy Text Embedding Inference on top of Ray, so we can auto-scale and deploy whatever model you request. There are cold start times. Models are "kept hot" for 60 minutes after the last usage before being purged.

My personal favorite open embeddings model (as of Apr 4, 2024) is nomic-ai/nomic-embed-text-v1.5

Usage

The base endpoint for HuggingFace embedding is https://api.ncsa.ai/llm/v1/embeddings

curl https://api.ncsa.ai/llm/v1/embeddings \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
     "model": "nomic-ai/nomic-embed-text-v1.5",
     "input": "What is Deep Learning?"
   }'

🐍 Python

from openai import OpenAI
client = OpenAI(api_key="empty",base_url="https://api.ncsa.ai/llm/v1/")

response = client.embeddings.create(
    model="nomic-ai/nomic-embed-text-v1.5",
    input="What is Deep Learning?"
)

print(response.data[0].embedding)

Last updated