Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 5 additions & 101 deletions docs/advanced-usage/local-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,113 +21,17 @@ Roo Code currently supports two main local model providers:
1. **Ollama:** A popular open-source tool for running large language models locally. It supports a wide range of models.
2. **LM Studio:** A user-friendly desktop application that simplifies the process of downloading, configuring, and running local models. It also provides a local server that emulates the OpenAI API.

## Setting Up Ollama
## Setting Up Local Models

1. **Download and Install Ollama:** Download the Ollama installer for your operating system from the [Ollama website](https://ollama.com/). Follow the installation instructions. Make sure Ollama is running
For detailed setup instructions, see:

```bash
ollama serve
```
* [Setting up Ollama](../providers/ollama)
* [Setting up LM Studio](../providers/lmstudio)

2. **Download a Model:** Ollama supports many different models. You can find a list of available models on the [Ollama website](https://ollama.com/library). Some recommended models for coding tasks include:

* `codellama:7b-code` (good starting point, smaller)
* `codellama:13b-code` (better quality, larger)
* `codellama:34b-code` (even better quality, very large)
* `qwen2.5-coder:32b`
* `mistralai/Mistral-7B-Instruct-v0.1` (good general-purpose model)
* `deepseek-coder:6.7b-base` (good for coding tasks)
* `llama3:8b-instruct-q5_1` (good for general tasks)

To download a model, open your terminal and run:

```bash
ollama pull <model_name>
```

For example:

```bash
ollama pull qwen2.5-coder:32b
```

3. **Configure the Model:** by default, Ollama uses a context window size of 2048 tokens, which is too small for Roo Code requests. You need to have at least 12k to get decent results, ideally - 32k. To configure a model, you actually need to set its parameters and save a copy of it.

##### Using Ollama runtime
Load the model (we will use `qwen2.5-coder:32b` as an example):

```bash
ollama run qwen2.5-coder:32b
```

Change context size parameter:

```bash
/set parameter num_ctx 32768
```

Save the model with a new name:

```bash
/save your_model_name
```
##### Using Ollama command line
Alternatively, you can write all your settings into a text file and generate the model in the command-line.


Create a text file with model settings, and save it (~/qwen2.5-coder-32k.txt). Here we've only used the `num_ctx` parameter, but you could include more parameters on the next line using the `PARAMETER name value` syntax.

```text
FROM qwen2.5-coder:32b
# sets the context window size to 32768, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 32768
```
Change directory to the `.ollama/models` directory. On most Macs, thats `~/.ollama/models` by default (`%HOMEPATH%\.ollama` on Windows).

```bash
cd ~/.ollama/models
```

Create your model from the settings text file you created. The syntax is `ollama create (name of the model you want to see) -f (text file with settings)`

```bash
ollama create qwen2.5-coder-32k -f ~/qwen2.5-coder-32k.txt
```



4. **Configure Roo Code:**
* Open the Roo Code sidebar (<Codicon name="rocket" /> icon).
* Click the settings gear icon (<Codicon name="gear" />).
* Select "ollama" as the API Provider.
* Enter the Model name from the previous step (e.g., `your_model_name`) or choose it from the radio button list that should appear below `Model ID` if Ollama is currently running.
* (Optional) You can configure the base URL if you're running Ollama on a different machine. The default is `http://localhost:11434`.
* (Optional) Configure Model context size in Advanced settings, so Roo Code knows how to manage its sliding window.

## Setting Up LM Studio

1. **Download and Install LM Studio:** Download LM Studio from the [LM Studio website](https://lmstudio.ai/).
2. **Download a Model:** Use the LM Studio interface to search for and download a model. Some recommended models include those listed above for Ollama. Look for models in the GGUF format.
3. **Start the Local Server:**
* In LM Studio, click the **"Local Server"** tab (the icon looks like `<->`).
* Select your downloaded model.
* Click **"Start Server"**.
4. **Configure Roo Code:**
* Open the Roo Code sidebar (<Codicon name="rocket" /> icon).
* Click the settings gear icon (<Codicon name="gear" />).
* Select "lmstudio" as the API Provider.
* Enter the Model ID. This should be the name of the model file you loaded in LM Studio (e.g., `codellama-7b.Q4_0.gguf`). LM Studio shows a list of "Currently loaded models" in its UI.
* (Optional) You can configure the base URL if you're running LM Studio on a different machine. The default is `http://localhost:1234`.
Both providers offer similar capabilities but with different user interfaces and workflows. Ollama provides more control through its command-line interface, while LM Studio offers a more user-friendly graphical interface.

## Troubleshooting

* **"Please check the LM Studio developer logs to debug what went wrong":** This error usually indicates a problem with the model or its configuration in LM Studio. Try the following:
* Make sure the LM Studio local server is running and that the correct model is loaded.
* Check the LM Studio logs for any error messages.
* Try restarting the LM Studio server.
* Ensure your chosen model is compatible with Roo Code. Some very small models may not work well.
* Some models may require a larger context length.

* **"No connection could be made because the target machine actively refused it":** This usually means that the Ollama or LM Studio server isn't running, or is running on a different port/address than Roo Code is configured to use. Double-check the Base URL setting.

* **Slow Response Times:** Local models can be slower than cloud-based models, especially on less powerful hardware. If performance is an issue, try using a smaller model.
Expand Down