Debugging performance #285

pdufour · 2023-05-29T14:29:52Z

pdufour
May 29, 2023

Hi! Thanks for building this awesome library, I'm trying to figure out how many tokens / s are generated by this so I can compare performance to other libraries like https://github.com/abetlen/llama-cpp-python. This gives you a debug output like the following:
Output generated in 266.13 seconds (1.50 tokens/s, 398 tokens, context 627) Any way to get a similar output when running the repl command? Thanks!

philpax · 2023-05-29T14:43:12Z

philpax
May 29, 2023
Maintainer

You can't easily get this with the repl mode because it's optimised for back and forth, but try the --stats option for infer and let me know if that tells you what you need!

0 replies

danforbes · 2023-05-29T16:46:21Z

danforbes
May 29, 2023
Collaborator

These two examples both demonstrate piping some inference stats to the terminal: inference.rs & vicuna-chat.rs

0 replies

pdufour · 2023-05-31T10:31:57Z

pdufour
May 31, 2023
Author

Perfect, thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugging performance #285

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Debugging performance #285

pdufour May 29, 2023

Replies: 3 comments

philpax May 29, 2023 Maintainer

danforbes May 29, 2023 Collaborator

pdufour May 31, 2023 Author

pdufour
May 29, 2023

philpax
May 29, 2023
Maintainer

danforbes
May 29, 2023
Collaborator

pdufour
May 31, 2023
Author