We have a small 2x A100 (40GB) server w/ a 100% uptime Ollama instance.
Ollama
Ollama
This service is no longer publicly available. It's being used in production for UIUC.chat. For stability, we cannot allow arbitrary use.
Only use llama3.1:70b and nomic-embed-text:v1.5
Requesting any other model will cause "thrashing" since there's not enough GPU memory and nobody's jobs will complete. Do not /pull new models.
Examples
Llama3 70b-instruct
# bash
curl https://ollama.ncsa.ai/api/chat -d '{
"model": "llama3.1:70b",
"messages": [
{ "role": "user", "content": "Write a long detailed bash program" }
]
}'
# python
from ollama import Client
client = Client(host='https://ollama.ncsa.ai')
response = client.chat(model='llama3.1:70b', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
Text embeddings
# bash
curl https://ollama.ncsa.ai/api/embeddings -d '{
"model": "nomic-embed-text:v1.5",
"prompt": "The sky is blue because of Rayleigh scattering"
}'
# python
from ollama import Client
client = Client(host='https://ollama.ncsa.ai')
client.embeddings(model='nomic-embed-text:v1.5', prompt='The sky is blue because of rayleigh scattering')