DOC: Register custom embedding model (#683)

xorbitsai · Nov 24, 2023 · 8fd2e3b · 8fd2e3b
1 parent 3ddc112
commit 8fd2e3b
Show file tree

Hide file tree

Showing 2 changed files with 93 additions and 9 deletions.
diff --git a/doc/source/models/builtin/qwen-chat.rst b/doc/source/models/builtin/qwen-chat.rst
@@ -43,3 +43,61 @@ Execute the following command to launch the model::
 .. note::
 
    4-bit and 8-bit quantization are not supported on macOS.
+
+Model Spec 3 (ggmlv3, 7 Billion)
+++++++++++++++++++++++++++++++++
+
+- **Model Format:** ggmlv3
+- **Model Size (in billions):** 7
+- **Quantizations:** q4_0
+- **Model ID:** Xorbits/qwen-chat-7B-ggml
+
+You need to install ``qwen-cpp`` first:
+
+.. code-block:: bash
+
+    pip install -U qwen-cpp
+
+
+If you want to use BLAS to accelerate:
+
+- OpenBLAS:
+
+.. code-block:: bash
+
+    CMAKE_ARGS="-DGGML_OPENBLAS=ON" pip install -U qwen-cpp
+
+
+- cuBLAS:
+
+.. code-block:: bash
+
+    CMAKE_ARGS="-DGGML_CUBLAS=ON" pip install -U qwen-cpp
+
+
+- Metal:
+
+.. code-block:: bash
+
+    CMAKE_ARGS="-DGGML_METAL=ON" pip install -U qwen-cpp
+
+
+Execute the following command to launch the model::
+
+   xinference launch --model-name qwen-chat --size-in-billions 7 --model-format ggmlv3
+
+
+Model Spec 4 (ggmlv3, 14 Billion)
++++++++++++++++++++++++++++++++++
+
+- **Model Format:** ggmlv3
+- **Model Size (in billions):** 14
+- **Quantizations:** q4_0
+- **Model ID:** Xorbits/qwen-chat-14B-ggml
+
+Install ``qwen-cpp`` as above.
+
+Execute the following command to launch the model::
+
+   xinference launch --model-name qwen-chat --size-in-billions 14 --model-format ggmlv3
+
diff --git a/doc/source/models/custom.rst b/doc/source/models/custom.rst
@@ -5,10 +5,10 @@ Custom Models
 =============
 Xinference provides a flexible and comprehensive way to integrate, manage, and utilize custom models.
 
-Define a custom model
-~~~~~~~~~~~~~~~~~~~~~
+Define a custom LLM model
+~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Define a custom model based on the following template:
+Define a custom LLM model based on the following template:
 
 .. code-block:: json
 
@@ -61,6 +61,29 @@ Define a custom model based on the following template:
 * prompt_style: An optional field that could be required by chat models to define the style of prompts. The given example has this set to None, but additional details could be found in a referenced file xinference/model/llm/tests/test_utils.py.
 
 
+Define a custom embedding model
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Define a custom embedding model based on the following template:
+
+.. code-block:: json
+
+    {
+        "model_name": "custom-bge-base-en",
+        "dimensions": 768,
+        "max_tokens": 512,
+        "language": ["en"],
+        "model_id": "BAAI/bge-base-en",
+        "model_uri": "file:///path/to/bge-base-en"
+    }
+
+* model_name: A string defining the name of the model. The name must start with a letter or a digit and can only contain letters, digits, underscores, or dashes.
+* dimensions: A integer that specifies the embedding dimensions.
+* max_tokens: A integer that represents the max sequence length that the embedding model supports.
+* language: A list of strings representing the supported languages for the model. Example: ["en"], which means that the model supports English.
+* model_id: A string representing the model ID, possibly referring to an identifier used by Hugging Face.
+* model_uri: A string representing the URI where the model can be loaded from, such as "file:///path/to/your_model". If model URI is absent, Xinference will try to download the model from Hugging Face with the model ID.
+
 Register a Custom Model
 ~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -77,13 +100,16 @@ Register a custom model programmatically:
    # replace with real xinference endpoint
    endpoint = 'http://localhost:9997'
    client = Client(endpoint)
-   client.register_model(model_type="LLM", model=model, persist=False)
+   client.register_model(model_type="<model_type>", model=model, persist=False)
 
 Or via CLI:
 
 .. code-block:: bash
 
-   xinference register --model-type LLM --file model.json --persist
+   xinference register --model-type <model_type> --file model.json --persist
+
+Note that replace the ``<model_type>`` above with ``LLM`` or ``embedding``. The same as below.
+
 
 List the Built-in and Custom Models
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -92,13 +118,13 @@ List built-in and custom models programmatically:
 
 .. code-block:: python
 
-   registrations = client.list_model_registrations(model_type="LLM")
+   registrations = client.list_model_registrations(model_type="<model_type>")
 
 Or via CLI:
 
 .. code-block:: bash
 
-   xinference registrations --model-type LLM
+   xinference registrations --model-type <model_type>
 
 Launch the Custom Model
 ~~~~~~~~~~~~~~~~~~~~~~~
@@ -162,10 +188,10 @@ Unregister the custom model programmatically:
 
 .. code-block:: python
 
-   model = client.unregister_model(model_type='LLM', model_name='custom-llama-2')
+   model = client.unregister_model(model_type="<model_type>", model_name='custom-llama-2')
 
 Or via CLI:
 
 .. code-block:: bash
 
-   xinference unregister --model-type LLM --model-name custom-llama-2
+   xinference unregister --model-type <model_type> --model-name custom-llama-2