Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding automatic_embedding_tei_inference_endpoints.ipynb #5

Merged
merged 59 commits into from
Feb 19, 2024

Conversation

datavistics
Copy link
Collaborator

What does this PR do?

Adds a automatic_embedding_tei_inference_endpoints notebook.

Fixes #4

Who can review?

Feel free to tag members/contributors who may be interested in your PR.
@MKhalusova

@MKhalusova MKhalusova self-requested a review December 18, 2023 14:20
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -0,0 +1,962 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hub 0.20.1 is already available, so we probably don't need this block of code anymore. Let's remove it then.


Reply via ReviewNB

@@ -0,0 +1,962 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You can briefly explain what the repos are. E.g. DATASET_IN is where the text data is, DATASET_OUT is where we'll store the embeddings.
  2. Normally, it's good to define the global variables all at the beginning, but I think it may make the notebook a bit easier to follow if the variables related to Inference Endpoint creation are declared in that section.

Reply via ReviewNB

@@ -0,0 +1,962 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we convert to pandas and to dict?


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unanswered.

@@ -0,0 +1,962 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #11.            revision="7302ac470bed880590f9344bfeee32ff8722d0e5",

What is this revision of?


Reply via ReviewNB

@@ -0,0 +1,962 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Its possible to have inputs that exceed the context. In those scenarios its up to you have to handle them. In my case Id like to truncate rather than have an error. Let's test that it works." =>

"You may have inputs that exceed the context. In such scenarios, it's up to you to handle them. In my case, I'd like to truncate rather than have an error. Let's test that it works."


Reply via ReviewNB

@@ -0,0 +1,962 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"lets pause" -> "let's pause"


Reply via ReviewNB

@@ -0,0 +1,962 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I find its easiest" -> "I find it easiest"


Reply via ReviewNB

@@ -0,0 +1,962 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the user you in this case? Maybe, clarify a little. E.g. "I'm uploading the dataset to the account of the user I logged in with, but....."


Reply via ReviewNB

@@ -0,0 +1,962 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe say a few words, why you would want to delete the endpoint?


Reply via ReviewNB

Copy link
Contributor

@MKhalusova MKhalusova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really cool and super useful example! Thanks for working on this.
I think we can bring it to the next level by polishing the text a bit. I found some typos, and it would be cool to add some additional explanations in a couple of places.
Please add the image to the https://huggingface.co/datasets/huggingface/cookbook-images, so that we don't store it in this repo.

@datavistics
Copy link
Collaborator Author

Hey @MKhalusova , I made some changes. Can you let me know what you think?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@@ -0,0 +1,845 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reserve H1 (#) header only for the notebook title, and make this an H2 (##)


Reply via ReviewNB

@@ -0,0 +1,845 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #1.    # !pip install -q -r ../requirements.txt

I think we can remove this comment


Reply via ReviewNB

@@ -0,0 +1,845 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #5.    MAX_WORKERS = 5  # This is for how many async workers you want. Choose based on the model and HW

What is HW?


Reply via ReviewNB

@@ -0,0 +1,845 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say "Hugging Face offers a number of GPUs that you can choose from" instead of "We have"


Reply via ReviewNB

@@ -0,0 +1,845 @@
{
Copy link
Contributor

@MKhalusova MKhalusova Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's


Reply via ReviewNB

@MKhalusova
Copy link
Contributor

There are a few important things that we need to address in this PR to be able to merge it:

  1. Add the notebook to the _toctree.yml
  2. Add the notebook to the list of the latest notebooks in the index.md. As this is the most recent addition, put it at the top of the list.
  3. Move the notebook under notebooks/en
  4. Make sure H1 (#) header is only used for the notebook title, and all the other headers are H2(##) or H3(###)
  5. Right after the first header (notebook title), add yourself as an author, like this: _Authored by: [Your Name](https://huggingface.co/your_profile)_ Feel free to use either your Hugging Face profile, or GitHub profile, it's up to you which one to link.

I also left a few minor comments. Once these are addressed, we can merge.

Comment on lines +3 to +4
- local: automatic_embedding_tei_inference_endpoints
title: Automatic Embeddings with TEI through Inference Endpoints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index should be the top one here. Feel free to move it under index

@MKhalusova MKhalusova merged commit fe78cf0 into huggingface:main Feb 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatic Embeddings: API driven embeddings with TEI and Inference Endpoints
6 participants