Simple Ollama hosting

We have a small 2x A100 (40GB) server w/ a 100% uptime Ollama instance.

You can request new models with the /pull endpoint, those will be loaded and ready for use.

Examples

Llama3 70b-instruct

# bash
curl https://ollama.ncsa.ai/api/chat -d '{
  "model": "llama3:70b-instruct",
  "messages": [
    { "role": "user", "content": "Write a long detailed bash program" }
  ]
}'

# python 
import ollama
ollama.chat(model='llama3:70b-instruct', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])

Text embeddings

# bash
curl https://ollama.ncsa.ai/api/embeddings -d '{
  "model": "nomic-embed-text:v1.5",
  "prompt": "The sky is blue because of Rayleigh scattering"
}'

# python 
import ollama
ollama.embeddings(model='nomic-embed-text:v1.5', prompt='The sky is blue because of rayleigh scattering')

Last updated