Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable messages api #581

Closed
wants to merge 19 commits into from
Closed

Enable messages api #581

wants to merge 19 commits into from

Conversation

radames
Copy link
Contributor

@radames radames commented Mar 26, 2024

Address #574: The idea is to utilize the endpoint as the base client, requiring users to specify the model name.

  • I had to replace model with endpointUrl because model is a required parameter for all APIs, including TGI.
  • Types are still incomplete, but choices?: Choice[]; is included in the StreamResponse when using messageAPI, maybe we can have a way to differentiate the input.
  • For custom streams endpoints, options are not needed and often throw a backend error, so it's preferable to exclude them on streamRequest

Considering the requirement for the model parameter, we might not need TaskWithNoAccessTokenNoModel WDYT?

@radames radames requested a review from coyotte508 March 26, 2024 05:09
@radames radames requested a review from vvmnnnkv as a code owner March 26, 2024 05:09
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be sure to tag @SBrandeis and @Wauplin for a review when this is ready to review (given we'd use the generated types)

@radames
Copy link
Contributor Author

radames commented Mar 26, 2024

interesting, point @julien-c, reading @Wauplin implementation here huggingface/huggingface_hub#2094 it's very complete! , in that regard, it makes sense to also support similar API with the js client in another PR, where we can use @xenova jinja + hub , to build the chat_template in case the user wants to use chat completion with a simple text generation model

@Wauplin
Copy link
Contributor

Wauplin commented Mar 27, 2024

interesting, point @julien-c, reading @Wauplin implementation here huggingface/huggingface_hub#2094 it's very complete! , in that regard, it makes sense to also support similar API with the js client in another PR, where we can use @xenova jinja + hub , to build the chat_template in case the user wants to use chat completion with a simple text generation model

@radames I agree focus should be put on handling the /v1/chat/completions route (i.e. server-side template rendering). The solution with jinja+hub is nice but in the end, only useful for a very few models AFAIK. Typically the microsoft/DialoGPT-*** family because all major chat-based LLMs are already supported and served with TGI. If I had to start over the PR in huggingface_hub I wouldn't start with the client-side rendering 😬

EDIT: especially the part where I try to handle Inference Endpoints URLs for which I don't have the model_id / chat_template. It's quite complex logic -hence the figure- for very little impact IMO, retrospectively 😕

coyotte508
coyotte508 previously approved these changes Mar 28, 2024
@coyotte508 coyotte508 requested review from Wauplin and SBrandeis March 28, 2024 23:22
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it'd be best to separate completely chatCompletion from the textGeneration method. Both are generating text but their API is very different. In the new spec-ed types we differenciated them (see chat completion here, and text generation here).

In particular parameters/options are not sent on the same level. For all HF tasks, we have a parameters key that is a mapping with all parameters. For chat completion, we wanted to mimic OpenAI's API which sets all the options at the root level. Also, output types are completely different and we don't benefit from combining them IMO.

Also, chat completion URL is not the same for models served on serverless Inference API. Usually, the url is https://api-inference.huggingface.co/models/{model_id}. For chat-completion, it's https://api-inference.huggingface.co/models/{model_id}/v1/chat/completions. In huggingface_hub, this is handled here. This rule also applies for TGI-served models where / serves text-generation API while /v1/chat/completions serves chat-completion API.

packages/inference/test/HfInference.spec.ts Outdated Show resolved Hide resolved
@coyotte508 coyotte508 dismissed their stale review March 29, 2024 11:28

following @Wauplin's review

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the awesome review + help + feedback from the python implem, @Wauplin

@radames
Copy link
Contributor Author

radames commented Mar 30, 2024

Hi @Wauplin, thank you for the feedback. I think separating chatCompletion and textGeneration is a great idea!
Also, @coyotte508, is it the plan to remove all types from the inference package and unify by using types from tasks? it's already possible to do this
import type { TextGenerationInput, TextGenerationStreamOutput } from "@huggingface/tasks";

@julien-c
Copy link
Member

julien-c commented Apr 2, 2024

Also, @coyotte508, is it the plan to remove all types from the inference package and unify by using types from tasks?

indeed

@coyotte508
Copy link
Member

Yes but we need to handle #584 first (and also using the types for validation would be great)

@radames
Copy link
Contributor Author

radames commented Apr 2, 2024

shall we wait then to split this into chatCompletion and textGeneration ?

@coyotte508
Copy link
Member

yes it should definitely be split.

Maybe make a separate PR / start from scratch? Maybe we should pass model along only for chatCompletion as well? (still we can move from model to endpointUrl)

@gary149
Copy link
Collaborator

gary149 commented Apr 29, 2024

Any news on this? that is quite needed imo

@radames
Copy link
Contributor Author

radames commented Apr 29, 2024

Any news on this? that is quite needed imo

hi @gary149 I'll do a new PR today, this one will be throw away in favor of a split creating a new chatCompletion method that will be cleaner than using it all with textGeneration , as well as matching the python api

@radames
Copy link
Contributor Author

radames commented May 1, 2024

close in favor of #645

@radames radames closed this May 1, 2024
coyotte508 added a commit that referenced this pull request May 13, 2024
Supersede #581. Thanks to @Wauplin, I can import the types from
"@huggingface/tasks"
I've followed the pattern for `textGeneration` and
`textGenerationStream`.

---------

Co-authored-by: coyotte508 <[email protected]>
Co-authored-by: Julien Chaumond <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants