Add Prodigy OpenAI project #180

ljvmiranda921 · 2023-02-21T02:46:34Z

This is an extra material for our Prodigy OpenAI blog post where we also benchmark zero-shot and supervised approaches for NER. I'll add more details in the description as we go along.

Description

TODO

Types of change

New Project

Checklist

I confirm that I have the right to submit this contribution under the project's MIT license.
I ran the tests, and all new and existing tests passed.
I ran the update scripts in the .github folder, and all the configs and docs are up-to-date.

So that converter auto works

I want to include the test predictions (at least) from OpenAI. So that others can reproduce some stuff.

koaning · 2023-02-22T10:07:01Z

integrations/prodigy_openai/openai/templates/ner_prompt.jinja2

@@ -0,0 +1,19 @@
+From the text below, extract the following entities in the following format:
+{# whitespace #}
+Cell: <comma delimited list of strings>


Mainly asking out of curiosity: is there a reason you decided to write a custom prompt?

This one's more for convenience when I was testing different templates. I find it a bit of a hassle typing each label in the CLI. Although the current project.yml setup already does that (but for now it's not being rendered into the template because there's no reference to labels). I'll update this later

koaning · 2023-02-22T10:07:45Z

integrations/prodigy_openai/project.yml

+    outputs:
+      - corpus/anem-test_texts.jsonl
+
+  - name: "openai-predict"


This is another curiosity: did you happen to keep track how expensive it was to run this query?

No I haven't :/ Time-wise though I ran the query for an hour and a half, with intermittent stops because of connection errors (--resume). I haven't tracked how much (money) it costs.

This is useful when we want to hydrate as Prodigy dataset with the data we need.

When downsampling, the labels may not always be equal.

ljvmiranda921 self-assigned this Feb 21, 2023

ljvmiranda921 added 4 commits February 21, 2023 13:43

Initial project setup

a4f9f92

Add .gitignore

8733fc3

Add dependencies

9cac816

Include spacy-transformers in dependencies

ab5538d

ljvmiranda921 force-pushed the add/openai-project branch from 429e96b to ab5538d Compare February 21, 2023 05:43

ljvmiranda921 added 7 commits February 21, 2023 14:30

Add training configuration

9e8cd45

Update project.yml and add commands for training

6fe164e

Add script for converting to JSONL

f36bb9c

Add Prodigy OpenAI related scripts

b3a1c1a

Add initial setup for evaluate gpt script

2157149

Require spaCy to be <3.5.0

950e609

So that converter auto works

Include the actual labels in the prompt

35cc2df

ljvmiranda921 force-pushed the add/openai-project branch from 5e246b9 to 35cc2df Compare February 22, 2023 00:28

Remove get-dataset dependency on assets

8463a0d

ljvmiranda921 force-pushed the add/openai-project branch from c913869 to 2e8d2b4 Compare February 22, 2023 03:31

ljvmiranda921 added 4 commits February 22, 2023 13:14

Ship zero-shot predictions from OpenAI

3f75a92

I want to include the test predictions (at least) from OpenAI. So that others can reproduce some stuff.

Update the project.yml

1a2bfa7

Fix test set name in project.yml

f680695

Update evaluation script to check on spans

76b1834

ljvmiranda921 force-pushed the add/openai-project branch from 2e8d2b4 to 76b1834 Compare February 22, 2023 05:14

koaning reviewed Feb 22, 2023

View reviewed changes

Include span information when converting to JSONL

8c984fc

This is useful when we want to hydrate as Prodigy dataset with the data we need.

ljvmiranda921 force-pushed the add/openai-project branch from d9b0e83 to 8c984fc Compare February 23, 2023 07:13

Add train-curve command

c16aa00

ljvmiranda921 force-pushed the add/openai-project branch from d26992d to c16aa00 Compare February 23, 2023 07:45

svlandeg added the enhancement New feature or request label Feb 23, 2023

ljvmiranda921 added 2 commits February 24, 2023 13:06

Update train-curve command and add a clean command

e011d2d

Add plotext to deps for train-curve

95ccb15

Sync recipes based on v1.12 PR

edb2591

ljvmiranda921 force-pushed the add/openai-project branch from de0fe2f to edb2591 Compare February 27, 2023 01:55

Add initial description

6127f89

ljvmiranda921 force-pushed the add/openai-project branch from 63c813c to 6127f89 Compare February 27, 2023 02:50

ljvmiranda921 added 11 commits March 1, 2023 11:51

Add command for ner.openai.correct

cc7a1df

Add NER manual command

02d9486

Make cmd description explicit

f83dc40

Update label names so they map properly in the UI

a7d1345

Create LABELS file to easily pass them in prodigy

ebe0c7f

Make evaluate_gpt more generalisable

0aa592e

Fix incorrect entity label

c361b56

Make assert condition less strict

c041ffc

When downsampling, the labels may not always be equal.

Add filter process for evaluation

295fedf

Generalize the evaluation script

1a95234

Accept multiple inputs for get_batches

3d7b919

ljvmiranda921 force-pushed the add/openai-project branch from 0befd99 to 3d7b919 Compare April 24, 2023 02:15

svlandeg unassigned ljvmiranda921 Aug 1, 2023

rmitsch closed this Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Prodigy OpenAI project #180

Add Prodigy OpenAI project #180

ljvmiranda921 commented Feb 21, 2023

koaning Feb 22, 2023

ljvmiranda921 Feb 23, 2023

koaning Feb 22, 2023

ljvmiranda921 Feb 23, 2023

Add Prodigy OpenAI project #180

Add Prodigy OpenAI project #180

Conversation

ljvmiranda921 commented Feb 21, 2023

Description

Types of change

Checklist

koaning Feb 22, 2023

Choose a reason for hiding this comment

ljvmiranda921 Feb 23, 2023

Choose a reason for hiding this comment

koaning Feb 22, 2023

Choose a reason for hiding this comment

ljvmiranda921 Feb 23, 2023

Choose a reason for hiding this comment