Skip to content

Commit

Permalink
add doc
Browse files Browse the repository at this point in the history
  • Loading branch information
qinxuye committed Jan 31, 2025
1 parent 3b27005 commit b33e353
Showing 1 changed file with 55 additions and 7 deletions.
62 changes: 55 additions & 7 deletions doc/source/models/builtin/llm/qwen2-vl-instruct.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,23 @@ chosen quantization method from the options listed above::
xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 2 --model-format awq --quantization ${quantization}


Model Spec 5 (pytorch, 7 Billion)
Model Spec 5 (mlx, 2 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** mlx
- **Model Size (in billions):** 2
- **Quantizations:** 4bit, 8bit
- **Engines**: MLX
- **Model ID:** mlx-community/Qwen2-VL-2B-Instruct-{quantization}
- **Model Hubs**: `Hugging Face <https://huggingface.co/mlx-community/Qwen2-VL-2B-Instruct-{quantization}>`__, `ModelScope <https://modelscope.cn/models/okwinds/Qwen2-VL-2B-Instruct-MLX-8bit>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 2 --model-format mlx --quantization ${quantization}


Model Spec 6 (pytorch, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
Expand All @@ -94,7 +110,7 @@ chosen quantization method from the options listed above::
xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format pytorch --quantization ${quantization}


Model Spec 6 (gptq, 7 Billion)
Model Spec 7 (gptq, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** gptq
Expand All @@ -110,7 +126,7 @@ chosen quantization method from the options listed above::
xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format gptq --quantization ${quantization}


Model Spec 7 (gptq, 7 Billion)
Model Spec 8 (gptq, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** gptq
Expand All @@ -126,7 +142,7 @@ chosen quantization method from the options listed above::
xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format gptq --quantization ${quantization}


Model Spec 8 (awq, 7 Billion)
Model Spec 9 (awq, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** awq
Expand All @@ -142,7 +158,23 @@ chosen quantization method from the options listed above::
xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format awq --quantization ${quantization}


Model Spec 9 (pytorch, 72 Billion)
Model Spec 10 (mlx, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** mlx
- **Model Size (in billions):** 7
- **Quantizations:** 4bit, 8bit
- **Engines**: MLX
- **Model ID:** mlx-community/Qwen2-VL-7B-Instruct-{quantization}
- **Model Hubs**: `Hugging Face <https://huggingface.co/mlx-community/Qwen2-VL-7B-Instruct-{quantization}>`__, `ModelScope <https://modelscope.cn/models/okwinds/Qwen2-VL-7B-Instruct-MLX-8bit>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format mlx --quantization ${quantization}


Model Spec 11 (pytorch, 72 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
Expand All @@ -158,7 +190,7 @@ chosen quantization method from the options listed above::
xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format pytorch --quantization ${quantization}


Model Spec 10 (awq, 72 Billion)
Model Spec 12 (awq, 72 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** awq
Expand All @@ -174,7 +206,7 @@ chosen quantization method from the options listed above::
xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format awq --quantization ${quantization}


Model Spec 11 (gptq, 72 Billion)
Model Spec 13 (gptq, 72 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** gptq
Expand All @@ -189,3 +221,19 @@ chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format gptq --quantization ${quantization}


Model Spec 14 (mlx, 72 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** mlx
- **Model Size (in billions):** 72
- **Quantizations:** 4bit, 8bit
- **Engines**: MLX
- **Model ID:** mlx-community/Qwen2-VL-72B-Instruct-{quantization}
- **Model Hubs**: `Hugging Face <https://huggingface.co/mlx-community/Qwen2-VL-72B-Instruct-{quantization}>`__, `ModelScope <https://modelscope.cn/models/okwinds/Qwen2-VL-72B-Instruct-MLX-{quantization}>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format mlx --quantization ${quantization}

0 comments on commit b33e353

Please sign in to comment.