Simple Ollama hosting

We have a small 2x A100 (40GB) server w/ a 100% uptime Ollama instance.

Ollama Rest API Docs
Ollama Python API Docs

This service is no longer publicly available. It's being used in production for UIUC.chat. For stability, we cannot allow arbitrary use.

Only use llama3.1:70b and nomic-embed-text:v1.5 Requesting any other model will cause "thrashing" since there's not enough GPU memory and nobody's jobs will complete. Do not /pull new models.

Examples

Llama3 70b-instruct

# bash
curl https://ollama.ncsa.ai/api/chat -d '{
  "model": "llama3.1:70b",
  "messages": [
    { "role": "user", "content": "Write a long detailed bash program" }
  ]
}'

# python 
from ollama import Client

client = Client(host='https://ollama.ncsa.ai')
response = client.chat(model='llama3.1:70b', messages=[
    {
        'role': 'user',
        'content': 'Why is the sky blue?',
    },
])

Text embeddings

# bash
curl https://ollama.ncsa.ai/api/embeddings -d '{
  "model": "nomic-embed-text:v1.5",
  "prompt": "The sky is blue because of Rayleigh scattering"
}'

# python 
from ollama import Client

client = Client(host='https://ollama.ncsa.ai')
client.embeddings(model='nomic-embed-text:v1.5', prompt='The sky is blue because of rayleigh scattering')

PreviousExperimental: Embeddings via Text Embedding Inference

Last updated 1 year ago