Skip to content

Commit

Permalink
DOC: Register custom embedding model (#683)
Browse files Browse the repository at this point in the history
  • Loading branch information
ChengjieLi28 authored Nov 24, 2023
1 parent 3ddc112 commit 8fd2e3b
Show file tree
Hide file tree
Showing 2 changed files with 93 additions and 9 deletions.
58 changes: 58 additions & 0 deletions doc/source/models/builtin/qwen-chat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,61 @@ Execute the following command to launch the model::
.. note::

4-bit and 8-bit quantization are not supported on macOS.

Model Spec 3 (ggmlv3, 7 Billion)
++++++++++++++++++++++++++++++++

- **Model Format:** ggmlv3
- **Model Size (in billions):** 7
- **Quantizations:** q4_0
- **Model ID:** Xorbits/qwen-chat-7B-ggml

You need to install ``qwen-cpp`` first:

.. code-block:: bash
pip install -U qwen-cpp
If you want to use BLAS to accelerate:

- OpenBLAS:

.. code-block:: bash
CMAKE_ARGS="-DGGML_OPENBLAS=ON" pip install -U qwen-cpp
- cuBLAS:

.. code-block:: bash
CMAKE_ARGS="-DGGML_CUBLAS=ON" pip install -U qwen-cpp
- Metal:

.. code-block:: bash
CMAKE_ARGS="-DGGML_METAL=ON" pip install -U qwen-cpp
Execute the following command to launch the model::

xinference launch --model-name qwen-chat --size-in-billions 7 --model-format ggmlv3


Model Spec 4 (ggmlv3, 14 Billion)
+++++++++++++++++++++++++++++++++

- **Model Format:** ggmlv3
- **Model Size (in billions):** 14
- **Quantizations:** q4_0
- **Model ID:** Xorbits/qwen-chat-14B-ggml

Install ``qwen-cpp`` as above.

Execute the following command to launch the model::

xinference launch --model-name qwen-chat --size-in-billions 14 --model-format ggmlv3

44 changes: 35 additions & 9 deletions doc/source/models/custom.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ Custom Models
=============
Xinference provides a flexible and comprehensive way to integrate, manage, and utilize custom models.

Define a custom model
~~~~~~~~~~~~~~~~~~~~~
Define a custom LLM model
~~~~~~~~~~~~~~~~~~~~~~~~~

Define a custom model based on the following template:
Define a custom LLM model based on the following template:

.. code-block:: json
Expand Down Expand Up @@ -61,6 +61,29 @@ Define a custom model based on the following template:
* prompt_style: An optional field that could be required by chat models to define the style of prompts. The given example has this set to None, but additional details could be found in a referenced file xinference/model/llm/tests/test_utils.py.


Define a custom embedding model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Define a custom embedding model based on the following template:

.. code-block:: json
{
"model_name": "custom-bge-base-en",
"dimensions": 768,
"max_tokens": 512,
"language": ["en"],
"model_id": "BAAI/bge-base-en",
"model_uri": "file:///path/to/bge-base-en"
}
* model_name: A string defining the name of the model. The name must start with a letter or a digit and can only contain letters, digits, underscores, or dashes.
* dimensions: A integer that specifies the embedding dimensions.
* max_tokens: A integer that represents the max sequence length that the embedding model supports.
* language: A list of strings representing the supported languages for the model. Example: ["en"], which means that the model supports English.
* model_id: A string representing the model ID, possibly referring to an identifier used by Hugging Face.
* model_uri: A string representing the URI where the model can be loaded from, such as "file:///path/to/your_model". If model URI is absent, Xinference will try to download the model from Hugging Face with the model ID.

Register a Custom Model
~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -77,13 +100,16 @@ Register a custom model programmatically:
# replace with real xinference endpoint
endpoint = 'http://localhost:9997'
client = Client(endpoint)
client.register_model(model_type="LLM", model=model, persist=False)
client.register_model(model_type="<model_type>", model=model, persist=False)
Or via CLI:

.. code-block:: bash
xinference register --model-type LLM --file model.json --persist
xinference register --model-type <model_type> --file model.json --persist
Note that replace the ``<model_type>`` above with ``LLM`` or ``embedding``. The same as below.


List the Built-in and Custom Models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -92,13 +118,13 @@ List built-in and custom models programmatically:

.. code-block:: python
registrations = client.list_model_registrations(model_type="LLM")
registrations = client.list_model_registrations(model_type="<model_type>")
Or via CLI:

.. code-block:: bash
xinference registrations --model-type LLM
xinference registrations --model-type <model_type>
Launch the Custom Model
~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -162,10 +188,10 @@ Unregister the custom model programmatically:

.. code-block:: python
model = client.unregister_model(model_type='LLM', model_name='custom-llama-2')
model = client.unregister_model(model_type="<model_type>", model_name='custom-llama-2')
Or via CLI:

.. code-block:: bash
xinference unregister --model-type LLM --model-name custom-llama-2
xinference unregister --model-type <model_type> --model-name custom-llama-2

0 comments on commit 8fd2e3b

Please sign in to comment.