This repository has been archived by the owner on Jun 24, 2024. It is now read-only.
Replies: 3 comments
-
You can't easily get this with the |
Beta Was this translation helpful? Give feedback.
0 replies
-
These two examples both demonstrate piping some inference stats to the terminal: inference.rs & vicuna-chat.rs |
Beta Was this translation helpful? Give feedback.
0 replies
-
Perfect, thanks! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi! Thanks for building this awesome library, I'm trying to figure out how many tokens / s are generated by this so I can compare performance to other libraries like https://github.com/abetlen/llama-cpp-python. This gives you a debug output like the following:
Output generated in 266.13 seconds (1.50 tokens/s, 398 tokens, context 627)
Any way to get a similar output when running the repl command? Thanks!Beta Was this translation helpful? Give feedback.
All reactions