-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable messages api #581
Enable messages api #581
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be sure to tag @SBrandeis and @Wauplin for a review when this is ready to review (given we'd use the generated types)
interesting, point @julien-c, reading @Wauplin implementation here huggingface/huggingface_hub#2094 it's very complete! , in that regard, it makes sense to also support similar API with the js client in another PR, where we can use @xenova jinja + hub , to build the chat_template in case the user wants to use chat completion with a simple text generation model |
@radames I agree focus should be put on handling the EDIT: especially the part where I try to handle Inference Endpoints URLs for which I don't have the model_id / chat_template. It's quite complex logic -hence the figure- for very little impact IMO, retrospectively 😕 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think it'd be best to separate completely chatCompletion
from the textGeneration
method. Both are generating text but their API is very different. In the new spec-ed types we differenciated them (see chat completion here, and text generation here).
In particular parameters/options are not sent on the same level. For all HF tasks, we have a parameters
key that is a mapping with all parameters. For chat completion, we wanted to mimic OpenAI's API which sets all the options at the root level. Also, output types are completely different and we don't benefit from combining them IMO.
Also, chat completion URL is not the same for models served on serverless Inference API. Usually, the url is https://api-inference.huggingface.co/models/{model_id}
. For chat-completion, it's https://api-inference.huggingface.co/models/{model_id}/v1/chat/completions
. In huggingface_hub
, this is handled here. This rule also applies for TGI-served models where /
serves text-generation API while /v1/chat/completions
serves chat-completion API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the awesome review + help + feedback from the python implem, @Wauplin
Hi @Wauplin, thank you for the feedback. I think separating |
indeed |
Yes but we need to handle #584 first (and also using the types for validation would be great) |
shall we wait then to split this into |
yes it should definitely be split. Maybe make a separate PR / start from scratch? Maybe we should pass |
Any news on this? that is quite needed imo |
hi @gary149 I'll do a new PR today, this one will be throw away in favor of a split creating a new |
close in favor of #645 |
Supersede #581. Thanks to @Wauplin, I can import the types from "@huggingface/tasks" I've followed the pattern for `textGeneration` and `textGenerationStream`. --------- Co-authored-by: coyotte508 <[email protected]> Co-authored-by: Julien Chaumond <[email protected]>
Address #574: The idea is to utilize the endpoint as the base client, requiring users to specify the model name.
model
withendpointUrl
becausemodel
is a required parameter for all APIs, including TGI.choices?: Choice[];
is included in the StreamResponse when using messageAPI, maybe we can have a way to differentiate the input.options
are not needed and often throw a backend error, so it's preferable to exclude them onstreamRequest
Considering the requirement for the
model
parameter, we might not needTaskWithNoAccessTokenNoModel
WDYT?