Experimental: Embeddings via Text Embedding Inference
Last updated
Last updated
We deploy Text Embedding Inference on top of Ray, so we can auto-scale and deploy whatever model you request. There are cold start times. Models are "kept hot" for 60 minutes after the last usage before being purged.
API reference:
Background information:
My personal favorite open embeddings model (as of Apr 4, 2024) is
The base endpoint for HuggingFace embedding is https://api.ncsa.ai/llm/v1/embeddings