Simple Ollama hosting
We have a small 2x A100 (40GB) server w/ a 100% uptime Ollama instance.
Ollama Rest API Docs
Ollama Python API Docs
This service is no longer publicly available. It's being used in production for UIUC.chat. For stability, we cannot allow arbitrary use.
Only use llama3.1:70b
and nomic-embed-text:v1.5
Requesting any other model will cause "thrashing" since there's not enough GPU memory and nobody's jobs will complete. Do not /pull new models.
Examples
Llama3 70b-instruct
# bash
curl https://ollama.ncsa.ai/api/chat -d '{
"model": "llama3.1:70b",
"messages": [
{ "role": "user", "content": "Write a long detailed bash program" }
]
}'
# python
from ollama import Client
client = Client(host='https://ollama.ncsa.ai')
response = client.chat(model='llama3.1:70b', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
Text embeddings
# bash
curl https://ollama.ncsa.ai/api/embeddings -d '{
"model": "nomic-embed-text:v1.5",
"prompt": "The sky is blue because of Rayleigh scattering"
}'
# python
from ollama import Client
client = Client(host='https://ollama.ncsa.ai')
client.embeddings(model='nomic-embed-text:v1.5', prompt='The sky is blue because of rayleigh scattering')
Last updated