🏛️
LLM Serving Docs
🔗 NCSA.ai GUI
  • NCSA.ai Docs
  • Experimental: Function calling, JSON and Regex mode
  • Experimental: Embeddings via Text Embedding Inference
  • Simple Ollama hosting
Powered by GitBook
On this page
  • Recommended model
  • Usage
  • 🐍 Python

Experimental: Embeddings via Text Embedding Inference

PreviousExperimental: Function calling, JSON and Regex modeNextSimple Ollama hosting

Last updated 1 year ago

We deploy Text Embedding Inference on top of Ray, so we can auto-scale and deploy whatever model you request. There are cold start times. Models are "kept hot" for 60 minutes after the last usage before being purged.

  • API reference:

  • Background information:

Recommended model

My personal favorite open embeddings model (as of Apr 4, 2024) is

Usage

The base endpoint for HuggingFace embedding is https://api.ncsa.ai/llm/v1/embeddings

curl https://api.ncsa.ai/llm/v1/embeddings \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
     "model": "nomic-ai/nomic-embed-text-v1.5",
     "input": "What is Deep Learning?"
   }'

🐍 Python

from openai import OpenAI
client = OpenAI(api_key="empty",base_url="https://api.ncsa.ai/llm/v1/")

response = client.embeddings.create(
    model="nomic-ai/nomic-embed-text-v1.5",
    input="What is Deep Learning?"
)

print(response.data[0].embedding)
https://huggingface.github.io/text-embeddings-inference/
https://github.com/huggingface/text-embeddings-inference
nomic-ai/nomic-embed-text-v1.5