-
Notifications
You must be signed in to change notification settings - Fork 641
estimate token use before sending openai completions #1112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
estimate token use before sending openai completions #1112
Conversation
Signed-off-by: Jeffrey Martin <[email protected]>
The issue was identified when attempting to validate this linked comment. |
Many good questions, will respond. We would love this for |
This is implemented in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is good. Some possible issues around max_tokens, context_len and deprefix.
Signed-off-by: Jeffrey Martin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Jeffrey Martin <[email protected]>
…in calculation; add numbers to exception message; adjust algebra to avoid false firing
…ax_completion_tokens to max output length if known
Noted a coupla things:
|
Signed-off-by: Jeffrey Martin <[email protected]>
Updates in 95452e0, create a consolidated method to support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coupla queries, maybe could be addressed in code comments
main pause point re: consistency of how some over-length conditions are handled
Signed-off-by: Jeffrey Martin <[email protected]>
Signed-off-by: Jeffrey Martin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still good with this and it seems important to get landed soonest.
578c3f9
to
f56502a
Compare
once there's a clear merge main, this looks good to go |
Testing has indicated that many |
When setting
max_tokens
for services compliant with OpenAI python client the value passed to the client needs to be reduce to a maximum of the model's supported context length inclusive of the tokens in the prompt request.This revision validates the available context space before attempting to request inference with the following behaviors:
Please review with a eye to desired runtime behavior, should the run be terminated if a prompt from a probe exceeds the context length of the target model or should the run continue and simply log the skipped
Attempt
?Error reported as 400 response when context length of the model is exceeded:
Test example:
high_tokens_config.yaml:
Logged error: