Simple Ollama hosting
We have a small 2x A100 (40GB) server w/ a 100% uptime Ollama instance.
Ollama Rest API Docs
Ollama Python API Docs
This service is no longer publicly available. It's being used in production for UIUC.chat. For stability, we cannot allow arbitrary use.
Only use llama3.1:70b
and nomic-embed-text:v1.5
Requesting any other model will cause "thrashing" since there's not enough GPU memory and nobody's jobs will complete. Do not /pull new models.
Examples
Llama3 70b-instruct
Text embeddings
Last updated