Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add notebook for openai finetuning with kili #1259

Merged
merged 3 commits into from
Jun 12, 2023

Conversation

Jonas1312
Copy link
Contributor

No description provided.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

recipes/Kili_fine_tuning_tutorial.ipynb Outdated Show resolved Hide resolved
@@ -0,0 +1,2081 @@
{
Copy link
Contributor Author

@Jonas1312 Jonas1312 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that here you should explain that each line of this file is an input text and its class, separated by an "|"

Also, I would change the code like this instead:

with open(...) as f:
	lines = f.readlines()

lines = [line.split("|") for line in lines]
content_array = [line[0] for line in lines]
categories_array = [line[1] for line in lines]
external_id_array = [f"text_{i}" for i, _ in enumerate(content_array)]



Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1: addressed in NB

2. Will address this in version 2.0 of the NB

@@ -0,0 +1,2081 @@
{
Copy link
Contributor Author

@Jonas1312 Jonas1312 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the organization id is needed, not sure...


Reply via ReviewNB

@@ -0,0 +1,2081 @@
{
Copy link
Contributor Author

@Jonas1312 Jonas1312 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can say that as of now, only GPT3 models can be finetuned : https://platform.openai.com/docs/models/gpt-3


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to NB

@@ -0,0 +1,2081 @@
{
Copy link
Contributor Author

@Jonas1312 Jonas1312 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #5.    text: """

why "text:" and not "predicted class:" for example?


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on what I found in OpenAI's documentation.

@@ -0,0 +1,2081 @@
{
Copy link
Contributor Author

@Jonas1312 Jonas1312 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #22.        exported_class = (nested_field[0]['name'])

I would just use exported_class = json_data["latestLabel"]["jsonResponse"]['CLASSIFICATION_JOB']['categories'][0]["name"]

or use the label parsing


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be done in version 2.0 of the NB

@@ -0,0 +1,2081 @@
{
Copy link
Contributor Author

@Jonas1312 Jonas1312 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would print a few lines of this file, to make sure the strutcture is correct


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added a line with !head /content/kili-fine-tune.jsonl

@@ -0,0 +1,2081 @@
{
Copy link
Contributor Author

@Jonas1312 Jonas1312 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    sample_text2 = "Most Of Beijing To Be Tested For COVID-19 Amid Lockdown Worry. While only 70 cases have been found since the outbreak surfaced, authorities have followed a “zero-COVID” approach to try to prevent a further spread of the virus."

remove this line?


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out. This is a sample line in case if someone wanted to experiment a bit more.

@@ -0,0 +1,2081 @@
{
Copy link
Contributor Author

@Jonas1312 Jonas1312 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain that we start at 801 to assign new external ids

you can use the same code I gave above in the notebook, not dict needed for that


Reply via ReviewNB

@@ -0,0 +1,2081 @@
{
Copy link
Contributor Author

@Jonas1312 Jonas1312 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected better score after finetuning, weird


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

II agree. Maybe I'll run it by Jean L, just in case.

@Jonas1312 Jonas1312 marked this pull request as ready for review June 12, 2023 11:30
@Jonas1312 Jonas1312 changed the title docs: upload notebook docs: add notebook for openai finetuning with kili Jun 12, 2023
@Jonas1312 Jonas1312 merged commit 1187e1a into master Jun 12, 2023
@Jonas1312 Jonas1312 deleted the docs/openai-llm-finetuning branch June 12, 2023 11:32
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants