Skip to content

Commit 640866c

Browse files
authored
Merge pull request #307 from tokk-nv/feat-ollama-thor
Add Thor container method
2 parents c7dc551 + 5d36fe5 commit 640866c

File tree

1 file changed

+117
-28
lines changed

1 file changed

+117
-28
lines changed

docs/tutorial_ollama.md

Lines changed: 117 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -10,19 +10,20 @@ In this tutorial, we introduce two installation methods: (1) the default native
1010

1111
* The `ollama` [client](#ollama-client) can run inside or outside container after starting the [server](#ollama-server).
1212
* You can also run an [Open WebUI server](#open-webui) for supporting web clients.
13-
* Supports the latest models like [Llama-3](https://ollama.com/library/llama3){:target="_blank"} and [Phi-3 Mini](https://ollama.com/library/phi3){:target="_blank"}!
13+
* Supports the latest models like [gpt-oss](https://ollama.com/library/gpt-oss{:target="_blank"}!
1414

1515
## Ollama Server
1616

1717
!!! abstract "What you need"
1818

1919
1. One of the following Jetson devices:
2020

21+
<span class="blobDarkGreen3">Jetson AGX Thor</span>
2122
<span class="blobDarkGreen4">Jetson AGX Orin (64GB)</span>
2223
<span class="blobDarkGreen5">Jetson AGX Orin (32GB)</span>
2324
<span class="blobLightGreen3">Jetson Orin NX (16GB)</span>
2425
<span class="blobLightGreen3">Jetson Orin Nano (8GB)</span>
25-
26+
2627
2. Running one of the following versions of [JetPack](https://developer.nvidia.com/embedded/jetpack){:target="_blank"}:
2728

2829
<span class="blobPink1">JetPack 5 (L4T r35.x)</span>
@@ -32,14 +33,20 @@ In this tutorial, we introduce two installation methods: (1) the default native
3233

3334
- `7GB` for `ollama` container image
3435
- Space for models (`>5GB`)
35-
36+
3637

3738
## (1) Native Install
3839

40+
!!! note
41+
42+
Ollama native installer does not support Jetson AGX Thor Developer Kit yet.
43+
44+
If you want to run Ollama on Jetson AGX Thor Developer Kit, checkout the following [Ollama container](#2-docker-container-for-ollama) approach.
45+
3946
Ollama's official installer already support Jetson and can easily install CUDA-supporting Ollama.
4047

4148
```bash
42-
curl -fsSL https://ollama.com/install.sh | sh
49+
curl -fsSL https://ollama.com/install.sh | sh
4350
```
4451

4552
![](./images/ollama-official-installer.png)
@@ -60,38 +67,120 @@ ollama
6067
ollama run llama3.2:3b
6168
```
6269

63-
## (2) Docker container for `ollama` using `jetson-containers`
70+
## (2) Docker container for `ollama`
6471

65-
```
66-
# models cached under jetson-containers/data
67-
jetson-containers run --name ollama $(autotag ollama)
6872

69-
# models cached under your user's home directory
70-
docker run --runtime nvidia --rm --network=host -v ~/ollama:/ollama -e OLLAMA_MODELS=/ollama dustynv/ollama:r36.2.0
71-
```
73+
=== ":material-numeric-7-box: JetPack 7 (Jetson Thor)"
7274

73-
Running either of these will start the local Ollama server as a daemon in the background. It will save the models it downloads under your mounted `jetson-containers/data/models/ollama` directory (or another directory that you override with `OLLAMA_MODELS`)
75+
You can use this Docker container built to run Ollama on Jetson Thor.
7476

75-
Start the Ollama command-line chat client with your desired [model](https://ollama.com/library){:target="_blank"} (for example: `llama3`, `phi3`, `mistral`)
77+
```bash
78+
mkdir ~/ollama-data/
79+
docker run --rm -it -v ${HOME}/ollama-data:/data ghcr.io/nvidia-ai-iot/ollama:r38.2.arm64-sbsa-cu130-24.04
80+
```
7681

77-
```
78-
# if running inside the same container as launched above
79-
/bin/ollama run phi3
82+
It will take some time to pull (download) the container image.
8083

81-
# if launching a new container for the client in another terminal
82-
jetson-containers run $(autotag ollama) /bin/ollama run phi3
83-
```
84+
Once in the container, you will see something like this.
8485

85-
Or you can install Ollama's [binaries](https://github.com/ollama/ollama/releases){:target="_blank"} for arm64 outside of container (without CUDA, which only the server needs)
86+
```bash
87+
Starting ollama server
8688

87-
```
88-
# download the latest ollama release for arm64 into /bin
89-
sudo wget https://github.com/ollama/ollama/releases/download/$(git ls-remote --refs --sort="version:refname" --tags https://github.com/ollama/ollama | cut -d/ -f3- | sed 's/-rc.*//g' | tail -n1)/ollama-linux-arm64 -O /bin/ollama
90-
sudo chmod +x /bin/ollama
9189

92-
# use the client like normal outside container
93-
/bin/ollama run phi3
94-
```
90+
OLLAMA_HOST 0.0.0.0
91+
OLLAMA_LOGS /data/logs/ollama.log
92+
OLLAMA_MODELS /data/models/ollama/models
93+
94+
95+
ollama server is now started, and you can run commands here like 'ollama run gemma3'
96+
97+
root@2a79cc8835d9:/#
98+
```
99+
100+
Try running a GPT OSS (20b parameter) model by issuing a command below.
101+
102+
```bash
103+
ollama run --verbose gpt-oss:20b
104+
```
105+
106+
It will download 14GB weight, so it takes some time here as well.
107+
108+
Once ready, it will show something like this:
109+
110+
```bash
111+
root@2a79cc8835d9:/# ollama run --verbose gpt-oss:20b
112+
pulling manifest
113+
pulling b112e727c6f1: 100% ▕███████████████████████████████████████████▏ 13 GB
114+
pulling fa6710a93d78: 100% ▕███████████████████████████████████████████▏ 7.2 KB
115+
pulling f60356777647: 100% ▕███████████████████████████████████████████▏ 11 KB
116+
pulling d8ba2f9a17b3: 100% ▕███████████████████████████████████████████▏ 18 B
117+
pulling 55c108d8e936: 100% ▕███████████████████████████████████████████▏ 489 B
118+
verifying sha256 digest
119+
writing manifest
120+
success
121+
>>> Send a message (/? for help)
122+
```
123+
124+
Try any prompt and you will get something like this.
125+
126+
```bash
127+
root@c11344f6bbbc:/# ollama run --verbose gpt-oss:20b
128+
>>> why is the sky blue in one sentence
129+
Thinking...
130+
We need to answer: "why is the sky blue in one sentence". Just one sentence. Provide explanation: Rayleigh scattering of sunlight,
131+
shorter wavelengths scatter more. We'll produce a single sentence. Let's give a concise explanation.
132+
...done thinking.
133+
134+
The sky looks blue because the Earth's atmosphere scatters shorter-wavelength (blue) light from the sun more efficiently than longer
135+
wavelengths, a phenomenon called Rayleigh scattering.
136+
137+
total duration: 3.504445244s
138+
load duration: 225.399151ms
139+
prompt eval count: 76 token(s)
140+
prompt eval duration: 673.487645ms
141+
prompt eval rate: 112.85 tokens/s
142+
eval count: 88 token(s)
143+
eval duration: 2.603822053s
144+
eval rate: 33.80 tokens/s
145+
>>> Send a message (/? for help)
146+
```
147+
148+
You can finish the session by typing `/bye`.
149+
150+
=== ":material-numeric-6-box-outline: JetPack 6 (Jetson Orin)"
151+
152+
We can use `jetson-containers` to run Ollama.
153+
154+
```
155+
# models cached under jetson-containers/data
156+
jetson-containers run --name ollama $(autotag ollama)
157+
158+
# models cached under your user's home directory
159+
docker run --runtime nvidia --rm --network=host -v ~/ollama:/ollama -e OLLAMA_MODELS=/ollama dustynv/ollama:r36.2.0
160+
```
161+
162+
Running either of these will start the local Ollama server as a daemon in the background. It will save the models it downloads under your mounted `jetson-containers/data/models/ollama` directory (or another directory that you override with `OLLAMA_MODELS`)
163+
164+
Start the Ollama command-line chat client with your desired [model](https://ollama.com/library){:target="_blank"} (for example: `llama3`, `phi3`, `mistral`)
165+
166+
```
167+
# if running inside the same container as launched above
168+
/bin/ollama run phi3
169+
170+
# if launching a new container for the client in another terminal
171+
jetson-containers run $(autotag ollama) /bin/ollama run phi3
172+
```
173+
174+
Or you can install Ollama's [binaries](https://github.com/ollama/ollama/releases){:target="_blank"} for arm64 outside of container (without CUDA, which only the server needs)
175+
176+
```
177+
# download the latest ollama release for arm64 into /bin
178+
sudo wget https://github.com/ollama/ollama/releases/download/$(git ls-remote --refs --sort="version:refname" --tags https://github.com/ollama/ollama | cut -d/ -f3- | sed 's/-rc.*//g' | tail -n1)/ollama-linux-arm64 -O /bin/ollama
179+
sudo chmod +x /bin/ollama
180+
181+
# use the client like normal outside container
182+
/bin/ollama run phi3
183+
```
95184

96185
## Open WebUI
97186

@@ -105,4 +194,4 @@ You can then navigate your browser to `http://JETSON_IP:8080`, and create a fake
105194

106195
<img src="https://raw.githubusercontent.com/dusty-nv/jetson-containers/docs/docs/images/ollama_open_webui.jpg" width="800px"></img>
107196

108-
Ollama uses llama.cpp for inference, which various API benchmarks and comparisons are provided for on the [Llava](./tutorial_llava.md){:target="_blank"} page. It gets roughly half of peak performance versus the faster APIs like [NanoLLM](./tutorial_nano-llm.md), but is generally considered fast enough for text chat.
197+
Ollama uses llama.cpp for inference, which various API benchmarks and comparisons are provided for on the [Llava](./tutorial_llava.md){:target="_blank"} page. It gets roughly half of peak performance versus the faster APIs like [NanoLLM](./tutorial_nano-llm.md), but is generally considered fast enough for text chat.

0 commit comments

Comments
 (0)