RKLLAMA supports both the full model names used in the file system and simplified model names in the Ollama API style.
When using the Ollama-compatible API, you'll see model names in a simplified format:
family[-variant]:size
where:
family
is the model architecture (e.g., qwen2.5, mistral, llama3)variant
(optional) is the model variant (e.g., coder, math, instruct)size
is the parameter size (e.g., 3b, 7b)
The family and any variants appear before the colon, while only the parameter size appears after the colon.
qwen2.5:3b
- Qwen 2.5 with 3 billion parametersmistral:7b
- Base Mistral 7B modeldeepseek-coder:7b
- DeepSeek Coder variant with 7 billion parametersdeepseek-math:7b
- DeepSeek Math variant with 7 billion parametersllama2-chat:7b
- Llama 2 Chat variant with 7 billion parametersphi3:medium
- Phi-3 Medium model (size is part of family name)qwen2.5-coder-instruct:7b
- Qwen 2.5 Coder with Instruct capabilities, 7B parameters
Multiple variants are combined with hyphens before the colon.
Under the hood, RKLLAMA uses the full model names for file paths and model loading:
Qwen2.5-3B-Instruct-rk3588-w8a8-opt-0-hybrid-ratio-1.0
Llama-2-7B-Chat-rk3588-w4a16_g64-opt-0-hybrid-ratio-0.4
deepseek-coder-7b-instruct-v1.5-rk3588-w8a8-opt-1-hybrid-ratio-0.5
deepseek-math-7b-instruct-rk3588-w8a8-opt-1-hybrid-ratio-0.5
Full names typically include:
- Model architecture/family
- Optional variant (coder, math, etc.)
- Parameter size (e.g., 7B)
- Fine-tuning type (e.g., Instruct, Chat)
- Target platform (e.g., rk3588)
- Quantization details (e.g., w8a8, w4a16_g64)
- Optimization settings
RKLLAMA automatically handles conversion between simplified and full names with the following rules:
-
Model Family Detection:
- Recognizes common families:
llama
,llama2
,llama3
,mistral
,qwen
,qwen2.5
,deepseek
,phi
,phi2
,phi3
, etc.
- Recognizes common families:
-
Variant Detection:
- Extracts variants like
coder
,math
,instruct
,chat
,vision
, etc. - Multiple variants are joined with hyphens (e.g.,
coder-instruct
)
- Extracts variants like
-
Parameter Size Detection:
- Looks for patterns like
7B
,3b
,1.5b
to determine model size - Parameter size appears after the colon in simplified names
- Looks for patterns like
-
Simplified Format Construction:
family-variant:size
where variants are optional and multiple variants are hyphenated
You can use either format when making API requests:
{
"model": "deepseek-coder:7b",
"messages": [{"role": "user", "content": "Write a Python function to calculate Fibonacci numbers"}]
}
{
"model": "deepseek-coder-7b-instruct-v1.5-rk3588-w8a8-opt-1-hybrid-ratio-0.5",
"messages": [{"role": "user", "content": "Write a Python function to calculate Fibonacci numbers"}]
}
RKLLAMA will automatically resolve either format to the correct model.
When two models would simplify to the same name (e.g., models with the same family, variants, and parameter size), RKLLAMA uses a tiered approach to create unique names:
-
Differentiate by Quantization:
- If one model uses w8a8 and another uses w8a8_g128, they'll be named:
qwen2.5:7b
andqwen2.5-w8a8-g128:7b
-
Differentiate by Optimization Level:
- If models have different optimization levels, they'll be named:
qwen2.5:7b-opt0
andqwen2.5-opt1:7b
-
Differentiate by Hybrid Ratio:
- If models have different hybrid ratios, they'll be named:
qwen2.5:7b-r0.5
andqwen2.5-r1.0:7b
-
Numeric Suffix (Last Resort):
- If all other attributes are identical, a numeric suffix is added:
qwen2.5:7b
andqwen2.5-1:7b
This hierarchical approach maintains meaningful distinctions between similar models.
RKLLAMA also supports finding models by close matches to their names. For example:
- Requesting "qwen3b" might match "qwen2.5:3b"
- Requesting "mistral7b-instruct" might match "mistral-instruct:7b"
This fuzzy matching makes it easier to find models without knowing their exact naming.
To see all available models with their simplified names, use the /v1/models
endpoint:
curl http://localhost:8080/v1/models