You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -10,19 +10,20 @@ In this tutorial, we introduce two installation methods: (1) the default native
10
10
11
11
* The `ollama`[client](#ollama-client) can run inside or outside container after starting the [server](#ollama-server).
12
12
* You can also run an [Open WebUI server](#open-webui) for supporting web clients.
13
-
* Supports the latest models like [Llama-3](https://ollama.com/library/llama3){:target="_blank"} and [Phi-3 Mini](https://ollama.com/library/phi3){:target="_blank"}!
13
+
* Supports the latest models like [gpt-oss](https://ollama.com/library/gpt-oss{:target="_blank"}!
Running either of these will start the local Ollama server as a daemon in the background. It will save the models it downloads under your mounted `jetson-containers/data/models/ollama` directory (or another directory that you override with `OLLAMA_MODELS`)
75
+
You can use this Docker container built to run Ollama on Jetson Thor.
74
76
75
-
Start the Ollama command-line chat client with your desired [model](https://ollama.com/library){:target="_blank"} (for example: `llama3`, `phi3`, `mistral`)
77
+
```bash
78
+
mkdir ~/ollama-data/
79
+
docker run --rm -it -v ${HOME}/ollama-data:/data ghcr.io/nvidia-ai-iot/ollama:r38.2.arm64-sbsa-cu130-24.04
80
+
```
76
81
77
-
```
78
-
# if running inside the same container as launched above
79
-
/bin/ollama run phi3
82
+
It will take some time to pull (download) the container image.
80
83
81
-
# if launching a new container for the client in another terminal
82
-
jetson-containers run $(autotag ollama) /bin/ollama run phi3
83
-
```
84
+
Once in the container, you will see something like this.
84
85
85
-
Or you can install Ollama's [binaries](https://github.com/ollama/ollama/releases){:target="_blank"} for arm64 outside of container (without CUDA, which only the server needs)
86
+
```bash
87
+
Starting ollama server
86
88
87
-
```
88
-
# download the latest ollama release for arm64 into /bin
Running either of these will start the local Ollama server as a daemon in the background. It will save the models it downloads under your mounted `jetson-containers/data/models/ollama` directory (or another directory that you override with `OLLAMA_MODELS`)
163
+
164
+
Start the Ollama command-line chat client with your desired [model](https://ollama.com/library){:target="_blank"} (for example: `llama3`, `phi3`, `mistral`)
165
+
166
+
```
167
+
# if running inside the same container as launched above
168
+
/bin/ollama run phi3
169
+
170
+
# if launching a new container for the client in another terminal
171
+
jetson-containers run $(autotag ollama) /bin/ollama run phi3
172
+
```
173
+
174
+
Or you can install Ollama's [binaries](https://github.com/ollama/ollama/releases){:target="_blank"} for arm64 outside of container (without CUDA, which only the server needs)
175
+
176
+
```
177
+
# download the latest ollama release for arm64 into /bin
Ollama uses llama.cpp for inference, which various API benchmarks and comparisons are provided for on the [Llava](./tutorial_llava.md){:target="_blank"} page. It gets roughly half of peak performance versus the faster APIs like [NanoLLM](./tutorial_nano-llm.md), but is generally considered fast enough for text chat.
197
+
Ollama uses llama.cpp for inference, which various API benchmarks and comparisons are provided for on the [Llava](./tutorial_llava.md){:target="_blank"} page. It gets roughly half of peak performance versus the faster APIs like [NanoLLM](./tutorial_nano-llm.md), but is generally considered fast enough for text chat.
0 commit comments