Skip to content

Commit

Permalink
Add initial description
Browse files Browse the repository at this point in the history
  • Loading branch information
ljvmiranda921 committed Feb 27, 2023
1 parent edb2591 commit 63c813c
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 2 deletions.
40 changes: 39 additions & 1 deletion integrations/prodigy_openai/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,44 @@
<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->

# 🪐 spaCy Project: Benchmarking OpenAI datasets
# 🪐 spaCy Project: Using Prodigy's OpenAI recipes for a bio NER task

This project showcases Prodigy's OpenAI recipe for named-entity recognition
using the [Anatomical Entity Mention (AnEM)
dataset](https://aclanthology.org/W12-4304/). The dataset contains 11
anatomical entities (e.g., *organ*, *tissue*, *cellular component*, etc.)
based from the Common Anatomy Reference Ontology. The dataset statistics (and
some examples) are shown below:

<!-- TODO: insert dataset statistics -->

In this project, we trained a transformer-based NER model and compared it with the zero-shot
predictions of GPT-3. We wanted to test how large language models fare in a specific domain and
suggest ways on how we can leverage them to improve our annotations.

<!-- TODO: insert zero-shot and supervised learning diagrams -->
<!-- TODO: insert results -->

The transformer and zero-shot pipelines are defined by the `ner` and `gpt` workflows respectively.
In order to run the `gpt` workflow, make sure to [install Prodigy](https://prodi.gy/docs/install) as well
as a few additional Python dependencies:

```
python -m pip install prodigy -f https://[email protected]
python -m pip install -r requirements.txt
```

With `XXXX-XXXX-XXXX-XXXX` being your personal Prodigy license key.

Then, create a new API key from openai.com or fetch an existing one. Record the
secret key as well as the organization key and make sure these are available as
environmental variables. For instance, set them in a `.env` file in the root
directory:

```
PRODIGY_OPENAI_ORG = "org-..."
PRODIGY_OPENAI_KEY = "sk-..."
```


## 📋 project.yml

Expand Down
39 changes: 38 additions & 1 deletion integrations/prodigy_openai/project.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,41 @@
title: "Benchmarking OpenAI datasets"
title: "Using Prodigy's OpenAI recipes for a bio NER task"
description: |
This project showcases Prodigy's OpenAI recipe for named-entity recognition
using the [Anatomical Entity Mention (AnEM)
dataset](https://aclanthology.org/W12-4304/). The dataset contains 11
anatomical entities (e.g., *organ*, *tissue*, *cellular component*, etc.)
based from the Common Anatomy Reference Ontology. The dataset statistics (and
some examples) are shown below:
<!-- TODO: insert dataset statistics -->
In this project, we trained a transformer-based NER model and compared it with the zero-shot
predictions of GPT-3. We wanted to test how large language models fare in a specific domain and
suggest ways on how we can leverage them to improve our annotations.
<!-- TODO: insert zero-shot and supervised learning diagrams -->
<!-- TODO: insert results -->
The transformer and zero-shot pipelines are defined by the `ner` and `gpt` workflows respectively.
In order to run the `gpt` workflow, make sure to [install Prodigy](https://prodi.gy/docs/install) as well
as a few additional Python dependencies:
```
python -m pip install prodigy -f https://[email protected]
python -m pip install -r requirements.txt
```
With `XXXX-XXXX-XXXX-XXXX` being your personal Prodigy license key.
Then, create a new API key from openai.com or fetch an existing one. Record the
secret key as well as the organization key and make sure these are available as
environmental variables. For instance, set them in a `.env` file in the root
directory:
```
PRODIGY_OPENAI_ORG = "org-..."
PRODIGY_OPENAI_KEY = "sk-..."
```
directories:
- "assets"
Expand Down

0 comments on commit 63c813c

Please sign in to comment.