Skip to content

Ollama Cuda

Why ollama? Easy to use Local LLMs.

https://github.com/ollama/ollama


ollama:

  • url: http://ollama-service.ollama.svc.cluster.local:11434 (k8s internal)
  • http ingress: http://ollama.uaiso.lan
  • no apikey

requirements:

Deploy ollama with cuda support:

kubectl apply -f ollama-cuda.yaml

Choose your model by size to fit your gpu vram:

https://uaiso-serious.github.io/ollama-helper/

Example: pull the llama3.2 3b model:

kubectl -n ollama exec ollama-0 -- bash -c "ollama pull llama3.2:3b"