Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V1] Feedback Thread #12568

Open
simon-mo opened this issue Jan 30, 2025 · 10 comments
Open

[V1] Feedback Thread #12568

simon-mo opened this issue Jan 30, 2025 · 10 comments

Comments

@simon-mo
Copy link
Collaborator

simon-mo commented Jan 30, 2025

Please leave comments here about your usage of V1, does it work? does it not work? which feature do you need in order to adopt it? any bugs?

For bug report, please file it separately and link the issue here.

For in depth discussion, please feel free to join #sig-v1 in the vLLM Slack workspace.

@simon-mo simon-mo added the misc label Jan 30, 2025
@simon-mo simon-mo changed the title [V1] Feedback Threads [V1] Feedback Thread Jan 30, 2025
@simon-mo simon-mo removed the misc label Jan 30, 2025
@simon-mo simon-mo pinned this issue Jan 30, 2025
@wedobetter
Copy link

wedobetter commented Jan 30, 2025

👍 I have not done a proper benchmark but V1 feels superior, i.e. higher throughput + lower latency, TTFT.
The other thing that I have noticed is that logging has changed Running: 1 reqs, Waiting: 0 reqs, it used to print stats such token/s.

I have encountered a possible higher memory consumption issue, but am overall very pleased with the vllm community's hard work on V1.
#12529

@m-harmonic
Copy link

Does anyone know about this bug with n>1? Thanks
#12584

@robertgshaw2-redhat
Copy link
Collaborator

Does anyone know about this bug with n>1? Thanks #12584

Thanks, we are aware and have some ongoing PRs for it.

#10980

@robertgshaw2-redhat
Copy link
Collaborator

I have encountered a possible higher memory consumption issue, but am overall very pleased with the vllm community's hard work on V1.

Logging is in progress. Current main has a lot more and we will maintain compatibility with V0. Thanks!

@dchichkov
Copy link

Quick feedback [VLLM_USE_V1=1]:

  • n > 1 would be nice

  • guided_grammar (or anything guided really) would be nice

@robertgshaw2-redhat
Copy link
Collaborator

Quick feedback [VLLM_USE_V1=1]:

  • n > 1 would be nice
  • guided_grammar (or anything guided really) would be nice

Thanks, both are in progress

@hibukipanim
Copy link

are logprobs output (and specifically prompt logprobs with echo=True) expected to be working with current V1 (0.7.0)?
checking here before opening an issue to reproduce

@akshay-loci
Copy link

Maybe there is a better place to discuss this but the implementation for models that use more than one extra modality is quite non-intuitive. get_multimodal_embeddings() expects that we return a list or tensor of length equal to the number of multimodal items provided in the batch and we then have to make unintuitive assumptions on how the output passed into get_input_embeddings would look like because the batching being used while calling both functions is not the same. It would be much nicer if for example the input and output of get_multimodal_embeddings are dicts with the keys being the different modalities.

@robertgshaw2-redhat
Copy link
Collaborator

are logprobs output (and specifically prompt logprobs with echo=True) expected to be working with current V1 (0.7.0)? checking here before opening an issue to reproduce

Still in progress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants