Simple Ollama hosting

We have a small 2x A100 (40GB) server w/ a 100% uptime Ollama instance.

This service is no longer publicly available. It's being used in production for UIUC.chat. For stability, we cannot allow arbitrary use.

Only use llama3.1:70b and nomic-embed-text:v1.5 Requesting any other model will cause "thrashing" since there's not enough GPU memory and nobody's jobs will complete. Do not /pull new models.

Examples

Llama3 70b-instruct

# bash
curl https://ollama.ncsa.ai/api/chat -d '{
  "model": "llama3.1:70b",
  "messages": [
    { "role": "user", "content": "Write a long detailed bash program" }
  ]
}'

# python 
from ollama import Client

client = Client(host='https://ollama.ncsa.ai')
response = client.chat(model='llama3.1:70b', messages=[
    {
        'role': 'user',
        'content': 'Why is the sky blue?',
    },
])

Text embeddings

# bash
curl https://ollama.ncsa.ai/api/embeddings -d '{
  "model": "nomic-embed-text:v1.5",
  "prompt": "The sky is blue because of Rayleigh scattering"
}'

# python 
from ollama import Client

client = Client(host='https://ollama.ncsa.ai')
client.embeddings(model='nomic-embed-text:v1.5', prompt='The sky is blue because of rayleigh scattering')

Last updated