Difference between TokenizerManager and Runtime class #1898

NrKhader · 2024-11-03T09:54:38Z

I'm creating a class instance from the Runtime class and trying to generate text using the async_generate method

I tested that on gemma2 and for large context length the endpoint doesn't generate except for one token

This given that when I test it using sglang launch_server it works fine

I tested the same implementation on llama3.1 with large context length and it worked fine using the runtime class

So I'm wondering if there's a difference between the Runtime class and the TokenizerManager class?

This is an example output with loading the model with 8k context length

{'id': 'bc58d0f1564d4d3a87bcf444fd087690',
 'object': 'chat.completion',
 'created': 1730626163,
 'model': 'google/gemma-2-2b-it',
 'choices': [{'index': 0,
   'message': {'role': 'assistant', 'content': ''},
   'logprobs': None,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 7051,
  'total_tokens': 7053,
  'completion_tokens': 2}}

The text was updated successfully, but these errors were encountered:

ByronHsu · 2024-11-04T01:07:00Z

I am a bit confused by your question. you mean you served gemma using runtime class and launch_server but they generated different results?

For the diff between TokenizerManager and Runtime, runtime contains tokenizer manager. You can see the comments here for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between TokenizerManager and Runtime class #1898

Difference between TokenizerManager and Runtime class #1898

NrKhader commented Nov 3, 2024

ByronHsu commented Nov 4, 2024 •

edited

Loading

Difference between TokenizerManager and Runtime class #1898

Difference between TokenizerManager and Runtime class #1898

Comments

NrKhader commented Nov 3, 2024

ByronHsu commented Nov 4, 2024 • edited Loading

ByronHsu commented Nov 4, 2024 •

edited

Loading