[Feature] Save cache from requests and load #1932

SinanAkkoyun · 2024-11-06T02:56:51Z

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

This might be difficult to implement, but I am facing the following issue:
When running Qwen2-VL on bigger images, the preprocessor takes a long time to convert the images to tokens.

It would be awesome if we could have a way (OpenAI API with extra parameters) to tell the backend to store the cache of a request and load it by ID for another request, which would make it possible to not reprocess every image (and prompt in general) on each call.

If my problem could be solved in an easier way I would be thankful for any input :)

Related resources

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Save cache from requests and load #1932

[Feature] Save cache from requests and load #1932

SinanAkkoyun commented Nov 6, 2024 •

edited

Loading

[Feature] Save cache from requests and load #1932

[Feature] Save cache from requests and load #1932

Comments

SinanAkkoyun commented Nov 6, 2024 • edited Loading

Checklist

Motivation

Related resources

SinanAkkoyun commented Nov 6, 2024 •

edited

Loading