Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open source model support #65

Open
wants to merge 1 commit into
base: vyokky/dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 50 additions & 5 deletions model_worker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ The lite version of the prompt is not fully optimized. To achieve better results
### If you use QWEN as the Agent

1. QWen (Tongyi Qianwen) is a LLM developed by Alibaba. Go to [QWen](https://dashscope.aliyun.com/) and register an account and get the API key. More details can be found [here](https://help.aliyun.com/zh/dashscope/developer-reference/activate-dashscope-and-create-an-api-key?spm=a2c4g.11186623.0.0.7b5749d72j3SYU) (in Chinese).
2. Install the required packages dashscope or run the `setup.py` with `-qwen` options.
2. Uncomment the required packages in requirements.txt or install them separately.
```bash
pip install dashscope
```
Expand All @@ -23,7 +23,7 @@ You can find the model name in the [QWen LLM model list](https://help.aliyun.com
We provide a short example to show how to configure the ollama in the following, which might change if ollama makes updates.

```bash title="install ollama and serve LLMs in local" showLineNumbers
## Install ollama on Linux & WSL2 or run the `setup.py` with `-ollama` options
## Install ollama on Linux & WSL2.
curl https://ollama.ai/install.sh | sh
## Run the serving
ollama serve
Expand All @@ -45,19 +45,64 @@ When serving LLMs via Ollama, it will by default start a server at `http://local
"API_MODEL": "YOUR_MODEL"
}
```
NOTE: `API_BASE` is the URL started in the Ollama LLM server and `API_MODEL` is the model name of Ollama LLM, it should be same as the one you served before. In addition, due to model limitations, you can use lite version of prompt to have a taste on UFO which can be configured in `config_dev.yaml`. Attention to the top ***note***.
NOTE: `API_BASE` is the URL started in the Ollama LLM server and `API_MODEL` is the model name of Ollama LLM, it should be same as the one you served before. In addition, due to model limitations, you can use lite version of prompt to have a taste on UFO which can be configured in `config_dev.yaml`. Attention to the top ***NOTE***.

#### If you use your custom model as the Agent
1. Start a server with your model, which will later be used as the API base in `config.yaml`.

2. Add following configuration to `config.yaml`:
```json showLineNumbers
{
"API_TYPE": "custom_model" ,
"API_TYPE": "Custom" ,
"API_BASE": "YOUR_ENDPOINT",
"API_KEY": "YOUR_KEY",
"API_MODEL": "YOUR_MODEL"
}
```

NOTE: You should create a new Python script <custom_model>.py in the ufo/llm folder like the format of the <placeholder>.py, which needs to inherit `BaseService` as the parent class, as well as the `__init__` and `chat_completion` methods. At the same time, you need to add the dynamic import of your file in the `get_service` method of `BaseService`.
NOTE: You should create a new Python script `custom_model.py` in the ufo/llm folder like the format of the `placeholder.py`, which needs to inherit `BaseService` as the parent class, as well as the `__init__` and `chat_completion` methods. At the same time, you need to add the dynamic import of your file in the `get_service` method of `BaseService`.

####EXAMPLE
Also, ufo provides the usage of ***LLaVA-1.5*** and ***CogAgent*** as the example.

1.1 Download the essential libs of your custom model.

#### If you use LLaVA-1.5 as the Agent

Please refer to the [LLaVA](https://github.com/haotian-liu/LLaVA) project to download and prepare the LLaVA-1.5 model, for example:

```bash
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip # enable PEP 660 support
pip install -e .
```

#### If you use CogAgent as the Agent

Please refer to the [CogVLM](https://github.com/THUDM/CogVLM) project to download and prepare the CogAgent model. Download the sat version of the CogAgent weights `cogagent-chat.zip` from [here](https://huggingface.co/THUDM/CogAgent/tree/main), unzip it.

1.2 Start your custom model. You must customize your model to support the interface of the UFO.
For simplicity, you have to configure `YOUR_ENDPOINT/chat/completions`.

#### If you use LLaVA as the Agent
Add the `direct_generate_llava` method and a new post interface `/chat/completions` from the `custom_model_worker.py` to the into the `llava/serve/model_worker.py` And start it with the following command:
```bash
python -m llava.serve.llava_model_worker --host YOUR_HOST --port YOUR_POINT --worker YOUR_ENDPOINT --model-path liuhaotian/llava-v1.5-13b --no-register
```

#### If you use CogAgent as the Agent
You can modify the model generate from the `basic_demo/cli_demo.py` with a new post interface `/chat/completions` to enjoy it with UFO.

3. Add following configuration to `config.yaml`:
```json showLineNumbers
{
"API_TYPE": "Custom" ,
"API_BASE": "YOUR_ENDPOINT",
"API_MODEL": "YOUR_MODEL"
}
```

***Note***: Only LLaVA and CogAgent are supported as open source models for now. If you want to use your own model, remember to modify the dynamic import of your model file in the `get_service` method of `BaseService` in `ufo/llm/base.py`.
67 changes: 67 additions & 0 deletions model_worker/custom_worker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#Method to generate response from prompt and image using the Llava model
@torch.inference_mode()
def direct_generate_llava(self, params):
tokenizer, model, image_processor = self.tokenizer, self.model, self.image_processor

prompt = params["prompt"]
image = params.get("image", None)
if image is not None:
if DEFAULT_IMAGE_TOKEN not in prompt:
raise ValueError("Number of image does not match number of <image> tokens in prompt")

image = load_image_from_base64(image)
image = image_processor.preprocess(image, return_tensors='pt')['pixel_values'][0]
image = image.to(self.model.device, dtype=self.model.dtype)
images = image.unsqueeze(0)

replace_token = DEFAULT_IMAGE_TOKEN
if getattr(self.model.config, 'mm_use_im_start_end', False):
replace_token = DEFAULT_IM_START_TOKEN + replace_token + DEFAULT_IM_END_TOKEN
prompt = prompt.replace(DEFAULT_IMAGE_TOKEN, replace_token)

num_image_tokens = prompt.count(replace_token) * model.get_vision_tower().num_patches
else:
return {"text": "No image provided", "error_code": 0}

temperature = float(params.get("temperature", 1.0))
top_p = float(params.get("top_p", 1.0))
max_context_length = getattr(model.config, 'max_position_embeddings', 2048)
max_new_tokens = min(int(params.get("max_new_tokens", 256)), 1024)
stop_str = params.get("stop", None)
do_sample = True if temperature > 0.001 else False
input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors='pt').unsqueeze(0).to(self.device)
keywords = [stop_str]
max_new_tokens = min(max_new_tokens, max_context_length - input_ids.shape[-1] - num_image_tokens)

input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors='pt').unsqueeze(0).to(self.device)

input_seq_len = input_ids.shape[1]

generation_output = self.model.generate(
inputs=input_ids,
do_sample=do_sample,
temperature=temperature,
top_p=top_p,
max_new_tokens=max_new_tokens,
images=images,
use_cache=True,
)

generation_output = generation_output[0, input_seq_len:]
decoded = tokenizer.decode(generation_output, skip_special_tokens=True)

response = {"text": decoded}
print("response", response)
return response


# The API is included in llava and cogagent installations. If you customize your model, you can install fastapi via pip or uncomment the library in the requirements.
# import FastAPI
# app = FastAPI()

#For llava
@app.post("/chat/completions")
async def generate_llava(request: Request):
params = await request.json()
response_data = worker.direct_generate_llava(params)
return response_data
2 changes: 1 addition & 1 deletion requirements.txt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the previous version, it should be 10.3.0.

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ langchain==0.1.11
langchain_community==0.0.27
msal==1.25.0
openai==1.13.3
Pillow==10.2.0
Pillow==10.3.0
pywin32==306
pywinauto==0.6.8
PyYAML==6.0.1
Expand Down
18 changes: 18 additions & 0 deletions ufo/config/config.yaml.template
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ HOST_AGENT: {
# API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
# API_MODEL: "YOUR_MODEL", # The only OpenAI model by now that accepts visual input
# API_DEPLOYMENT_ID: "gpt-4-visual-preview", # The deployment id for the AOAI API

### Comment above and uncomment these according to your need if using "Qwen", "Ollama" or "Custom".
# API_TYPE: "Custom",
# API_BASE: "YOUR_ENDPOINT",
# API_KEY: "YOUR_KEY",
# API_MODEL: "YOUR_MODEL",

### For Azure_AD
# AAD_TENANT_ID: "YOUR_TENANT_ID", # Set the value to your tenant id for the llm model
Expand All @@ -39,6 +45,12 @@ APP_AGENT: {
# API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
# API_MODEL: "YOUR_MODEL", # The only OpenAI model by now that accepts visual input
# API_DEPLOYMENT_ID: "gpt-4-visual-preview", # The deployment id for the AOAI API

### Comment above and uncomment these according to your need if using "Qwen", "Ollama" or "Custom".
# API_TYPE: "Custom",
# API_BASE: "YOUR_ENDPOINT",
# API_KEY: "YOUR_KEY",
# API_MODEL: "YOUR_MODEL",

### For Azure_AD
# AAD_TENANT_ID: "YOUR_TENANT_ID", # Set the value to your tenant id for the llm model
Expand All @@ -63,6 +75,12 @@ BACKUP_AGENT: {
# API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
# API_MODEL: "YOUR_MODEL", # The only OpenAI model by now that accepts visual input
# API_DEPLOYMENT_ID: "gpt-4-visual-preview", # The deployment id for the AOAI API

### Comment above and uncomment these according to your need if using "Qwen", "Ollama" or "Custom".
# API_TYPE: "Custom",
# API_BASE: "YOUR_ENDPOINT",
# API_KEY: "YOUR_KEY",
# API_MODEL: "YOUR_MODEL",

### For Azure_AD
# AAD_TENANT_ID: "YOUR_TENANT_ID", # Set the value to your tenant id for the llm model
Expand Down
29 changes: 27 additions & 2 deletions ufo/llm/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,22 +14,47 @@ def chat_completion(self, *args, **kwargs):
pass

@staticmethod
def get_service(name):
def get_service(name, model_name=None):
"""
Get the service based on the given name and custom model.
Args:
name (str): The name of the service.
model_name (str, optional): The model name.
Returns:
object: The service object.
Raises:
ValueError: If the given service name or model name is not supported.
"""
service_map = {
'openai': 'OpenAIService',
'aoai': 'OpenAIService',
'azure_ad': 'OpenAIService',
'qwen': 'QwenService',
'ollama': 'OllamaService',
'placeholder': 'PlaceHolderService',
'custom': 'CustomService',
}
custom_service_map = {
'llava': 'LlavaService',
'cogagent': 'CogAgentService',
}
service_name = service_map.get(name, None)
if service_name:
if name in ['aoai', 'azure_ad']:
module = import_module('.openai', package='ufo.llm')
elif service_name == 'CustomService':
custom_model = 'llava' if 'llava' in model_name else model_name
custom_service_name = custom_service_map.get('llava' if 'llava' in custom_model else custom_model, None)
if custom_service_name:
module = import_module('.'+custom_model, package='ufo.llm')
service_name = custom_service_name
else:
raise ValueError(f'Custom model {custom_model} not supported')
else:
module = import_module('.'+name.lower(), package='ufo.llm')
return getattr(module, service_name)
return getattr(module, service_name)
else:
raise ValueError(f'Model {name} not supported')

def get_cost_estimator(self, api_type, model, prices, prompt_tokens, completion_tokens) -> float:
"""
Expand Down
81 changes: 81 additions & 0 deletions ufo/llm/cogagent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import time
from typing import Any, Optional

import requests

from ufo.utils import print_with_color
from .base import BaseService


class CogAgentService(BaseService):
def __init__(self, config, agent_type: str):
self.config_llm = config[agent_type]
self.config = config
self.max_retry = self.config["MAX_RETRY"]
self.timeout = self.config["TIMEOUT"]
self.max_tokens = 2048 #default max tokens for cogagent for now

def chat_completion(
self,
messages,
n,
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
top_p: Optional[float] = None,
**kwargs: Any,
):
"""
Generate chat completions based on given messages.
Args:
messages (list): A list of messages.
n (int): The number of completions to generate.
temperature (float, optional): The temperature for sampling. Defaults to None.
max_tokens (int, optional): The maximum number of tokens in the completion. Defaults to None.
top_p (float, optional): The cumulative probability for top-p sampling. Defaults to None.
**kwargs: Additional keyword arguments.
Returns:
tuple: A tuple containing the generated texts and None.
"""

temperature = temperature if temperature is not None else self.config["TEMPERATURE"]
max_tokens = max_tokens if max_tokens is not None else self.config["MAX_TOKENS"]
top_p = top_p if top_p is not None else self.config["TOP_P"]

texts = []
for i in range(n):
image_base64 = None
if self.config_llm["VISUAL_MODE"]:
image_base64 = messages[1]['content'][-2]['image_url']\
['url'].split('base64,')[1]
prompt = messages[0]['content'] + messages[1]['content'][-1]['text']

payload = {
'model': self.config_llm['API_MODEL'],
'prompt': prompt,
'temperature': temperature,
'top_p': top_p,
'max_new_tokens': self.max_tokens,
"image":image_base64
}

for _ in range(self.max_retry):
try:
response = requests.post(self.config_llm['API_BASE']+"/chat/completions", json=payload)
if response.status_code == 200:
response = response.json()
text = response["text"]
texts.append(text)
break
else:
raise Exception(
f"Failed to get completion with error code {response.status_code}: {response.text}",
)
except Exception as e:
print_with_color(f"Error making API request: {e}", "red")
try:
print_with_color(response, "red")
except:
_
time.sleep(3)
continue
return texts, None
Loading