Minimal CPU-only Ollama Docker Image

This repository provides a minimal CPU-only Ollama Docker image, specifically designed to run on systems without GPU support. At just 70MB, this image is significantly smaller than the official Ollama image, which is around 4GB.

ollama latest b99944c07117   3 hours ago    69.3MB

Notes

This image is not based on alpine, but wolfi.dev, I will work it out with alpine later
Got big help from @kth8 ( ollama/ollama#7184 (comment) )

Why Use This Image?

Lightweight: The official Ollama image is over 4GB in size, which can be overkill for systems that only need CPU-based processing. This image is only 70MB, making it much faster to download and deploy.
CPU-only Support: This image is tailored for systems without GPUs. It ensures you can run Ollama efficiently, even on basic or resource-constrained environments, without needing specialized hardware.
Run Anywhere: Whether you're working on local servers, edge devices, or cloud environments that don’t offer GPU resources, this image allows you to run Ollama anywhere, focusing purely on CPU-based operations.

How to Use

Pull the image

docker pull alpine/ollama

Run the service with API supported

docker rm -f ollama
docker run -d -p 11434:11434 -v ~/.ollama/root/.ollama --name ollama alpine/ollama

Download the models, for example, llama3.2, only run once. It will save the model locally, you can re-use it later.

docker exec -ti ollama ollama pull llama3.2

If you don't want to download, you can choice to use alpine/llama3.2 image directly. I create this with model "llama3.2" integrated already

docker run -d -p 11434:11434 --name llama3.2 alpine/llama3.2

Test its API service with curl

$ curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt":"Why is the sky blue?"
}'

{"model":"llama3.2","created_at":"2024-10-16T00:25:58.59931201Z","response":"The","done":false}
{"model":"llama3.2","created_at":"2024-10-16T00:25:58.695826838Z","response":" sky","done":false}
{"model":"llama3.2","created_at":"2024-10-16T00:25:58.780917761Z","response":" appears","done":false}
{"model":"llama3.2","created_at":"2024-10-16T00:25:58.992556209Z","response":" blue","done":false}
{"model":"llama3.2","created_at":"2024-10-16T00:25:59.085970606Z","response":" because","done":false}
{"model":"llama3.2","created_at":"2024-10-16T00:25:59.30869749Z","response":" of","done":false}
...

If you monitor the CPU usage, for example, with htop, you would see the high CPU usage

You can deploy the Ollama web UI to chat with it directly. There are many tools available, but I won't recommend any specific one.

Use case

this image could be deployed to any enviornment, for example, in kubernetes cluster, you can use it to analyze logs, streamlining logs with local LLMs, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.llama3.2		Dockerfile.llama3.2
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Minimal CPU-only Ollama Docker Image

Notes

Why Use This Image?

How to Use

Use case

About

Releases

Sponsor this project

Packages

Languages

License

alpine-docker/ollama

Folders and files

Latest commit

History

Repository files navigation

Minimal CPU-only Ollama Docker Image

Notes

Why Use This Image?

How to Use

Use case

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages