Question about tokenizer_manager.py line 200 : Regarding the preprocessing of multimodal models #1824

CSEEduanyu · 2024-10-28T03:43:39Z

CSEEduanyu
Oct 28, 2024

sglang/python/sglang/srt/managers/tokenizer_manager.py

Line 200 in 6fcd6d7

input_ids = image_inputs["input_ids"]

CSEEduanyu · 2024-10-28T03:48:20Z

CSEEduanyu
Oct 28, 2024
Author

Question 1: According to the code here, the multi-modal inputs_ids is generated in the preprocessing stage, and the process is asynchronous. So the infer of the visual model and the infer of the LLM may happen at the same time. Will there be any conflict?

0 replies

CSEEduanyu · 2024-10-28T03:50:51Z

CSEEduanyu
Oct 28, 2024
Author

Question 2: In the few multimodal models that have been implemented so far, process_images_async() does not return the input_ids field, so I am confused by the code on line 200 here.

0 replies

CSEEduanyu · 2024-10-28T03:55:26Z

CSEEduanyu
Oct 28, 2024
Author

My current dilemma is that I want to add support for a new multimodal model, but I don't know whether process_images_async() should return the input_ids of the image mappings or the raw pixel information?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about tokenizer_manager.py line 200 : Regarding the preprocessing of multimodal models #1824

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Question about tokenizer_manager.py line 200 : Regarding the preprocessing of multimodal models #1824

CSEEduanyu Oct 28, 2024

Replies: 3 comments

CSEEduanyu Oct 28, 2024 Author

CSEEduanyu Oct 28, 2024 Author

CSEEduanyu Oct 28, 2024 Author

CSEEduanyu
Oct 28, 2024

CSEEduanyu
Oct 28, 2024
Author

CSEEduanyu
Oct 28, 2024
Author

CSEEduanyu
Oct 28, 2024
Author