Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between TokenizerManager and Runtime class #1898

Open
NrKhader opened this issue Nov 3, 2024 · 1 comment
Open

Difference between TokenizerManager and Runtime class #1898

NrKhader opened this issue Nov 3, 2024 · 1 comment

Comments

@NrKhader
Copy link

NrKhader commented Nov 3, 2024

I'm creating a class instance from the Runtime class and trying to generate text using the async_generate method

I tested that on gemma2 and for large context length the endpoint doesn't generate except for one token

This given that when I test it using sglang launch_server it works fine

I tested the same implementation on llama3.1 with large context length and it worked fine using the runtime class

So I'm wondering if there's a difference between the Runtime class and the TokenizerManager class?

This is an example output with loading the model with 8k context length

{'id': 'bc58d0f1564d4d3a87bcf444fd087690',
 'object': 'chat.completion',
 'created': 1730626163,
 'model': 'google/gemma-2-2b-it',
 'choices': [{'index': 0,
   'message': {'role': 'assistant', 'content': ''},
   'logprobs': None,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 7051,
  'total_tokens': 7053,
  'completion_tokens': 2}}
@ByronHsu
Copy link
Collaborator

ByronHsu commented Nov 4, 2024

I am a bit confused by your question. you mean you served gemma using runtime class and launch_server but they generated different results?

For the diff between TokenizerManager and Runtime, runtime contains tokenizer manager. You can see the comments here for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants