Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I speed up using llmlingua2 ? #175

Open
yyjabiding opened this issue Aug 6, 2024 · 1 comment
Open

How can I speed up using llmlingua2 ? #175

yyjabiding opened this issue Aug 6, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@yyjabiding
Copy link

yyjabiding commented Aug 6, 2024

Describe the issue

I have a context length of about 100k.
Is there any methods that I can do to speed up using llmlingua2 to compress it and keep it within s short time like shorter than 2 seconds?"
Thanks.

@yyjabiding yyjabiding added the question Further information is requested label Aug 6, 2024
@iofu728 iofu728 self-assigned this Aug 22, 2024
@iofu728
Copy link
Contributor

iofu728 commented Aug 22, 2024

Hi @yyjabiding, thanks for your interest in LLMLingua.

Although we haven't tested it, it seems possible. LLMLingua-2 forwards a BERT-level model chunk by chunk, so increasing the batch size could potentially reduce latency. You can check the implementation here: LLMLingua Prompt Compressor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants