-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add notebook for openai finetuning with kili #1259
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -0,0 +1,2081 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that here you should explain that each line of this file is an input text and its class, separated by an "|"
Also, I would change the code like this instead:
with open(...) as f: lines = f.readlines()lines = [line.split("|") for line in lines]
content_array = [line[0] for line in lines]
categories_array = [line[1] for line in lines]
external_id_array = [f"text_{i}" for i, _ in enumerate(content_array)]
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1: addressed in NB
2. Will address this in version 2.0 of the NB
@@ -0,0 +1,2081 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,2081 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can say that as of now, only GPT3 models can be finetuned : https://platform.openai.com/docs/models/gpt-3
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to NB
@@ -0,0 +1,2081 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on what I found in OpenAI's documentation.
@@ -0,0 +1,2081 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #22. exported_class = (nested_field[0]['name'])
I would just use exported_class = json_data["latestLabel"]["jsonResponse"]['CLASSIFICATION_JOB']['categories'][0]["name"]
or use the label parsing
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be done in version 2.0 of the NB
@@ -0,0 +1,2081 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added a line with !head /content/kili-fine-tune.jsonl
@@ -0,0 +1,2081 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #2. sample_text2 = "Most Of Beijing To Be Tested For COVID-19 Amid Lockdown Worry. While only 70 cases have been found since the outbreak surfaced, authorities have followed a “zero-COVID” approach to try to prevent a further spread of the virus."
remove this line?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented out. This is a sample line in case if someone wanted to experiment a bit more.
@@ -0,0 +1,2081 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain that we start at 801 to assign new external ids
you can use the same code I gave above in the notebook, not dict needed for that
Reply via ReviewNB
@@ -0,0 +1,2081 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
II agree. Maybe I'll run it by Jean L, just in case.
No description provided.