Add initial description

explosion · Feb 27, 2023 · 63c813c · 63c813c
1 parent edb2591
commit 63c813c
Show file tree

Hide file tree

Showing 2 changed files with 77 additions and 2 deletions.
diff --git a/integrations/prodigy_openai/README.md b/integrations/prodigy_openai/README.md
@@ -1,6 +1,44 @@
 <!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->
 
-# 🪐 spaCy Project: Benchmarking OpenAI datasets
+# 🪐 spaCy Project: Using Prodigy's OpenAI recipes for a bio NER task
+
+This project showcases Prodigy's OpenAI recipe for named-entity recognition
+using the [Anatomical Entity Mention (AnEM)
+dataset](https://aclanthology.org/W12-4304/).  The dataset contains 11
+anatomical entities (e.g., *organ*, *tissue*, *cellular component*, etc.)
+based from the Common Anatomy Reference Ontology. The dataset statistics (and
+some examples) are shown below:
+
+<!-- TODO: insert dataset statistics -->
+
+In this project, we trained a transformer-based NER model and compared it with the zero-shot
+predictions of GPT-3. We wanted to test how large language models fare in a specific domain and
+suggest ways on how we can leverage them to improve our annotations. 
+
+<!-- TODO: insert zero-shot and supervised learning diagrams -->
+<!-- TODO: insert results -->
+
+The transformer and zero-shot pipelines are defined by the `ner` and `gpt` workflows respectively.
+In order to run the `gpt` workflow, make sure to [install Prodigy](https://prodi.gy/docs/install) as well
+as a few additional Python dependencies:
+
+```
+python -m pip install prodigy -f https://[email protected]
+python -m pip install -r requirements.txt
+```
+
+With `XXXX-XXXX-XXXX-XXXX` being your personal Prodigy license key.
+
+Then, create a new API key from openai.com or fetch an existing one. Record the
+secret key as well as the organization key and make sure these are available as
+environmental variables. For instance, set them in a `.env` file in the root
+directory:
+
+```
+PRODIGY_OPENAI_ORG = "org-..."
+PRODIGY_OPENAI_KEY = "sk-..."
+```
+
 
 ## 📋 project.yml
 

diff --git a/integrations/prodigy_openai/project.yml b/integrations/prodigy_openai/project.yml
@@ -1,4 +1,41 @@
-title: "Benchmarking OpenAI datasets"
+title: "Using Prodigy's OpenAI recipes for a bio NER task"
+description: |
+  This project showcases Prodigy's OpenAI recipe for named-entity recognition
+  using the [Anatomical Entity Mention (AnEM)
+  dataset](https://aclanthology.org/W12-4304/).  The dataset contains 11
+  anatomical entities (e.g., *organ*, *tissue*, *cellular component*, etc.)
+  based from the Common Anatomy Reference Ontology. The dataset statistics (and
+  some examples) are shown below:
+
+  <!-- TODO: insert dataset statistics -->
+
+  In this project, we trained a transformer-based NER model and compared it with the zero-shot
+  predictions of GPT-3. We wanted to test how large language models fare in a specific domain and
+  suggest ways on how we can leverage them to improve our annotations. 
+
+  <!-- TODO: insert zero-shot and supervised learning diagrams -->
+  <!-- TODO: insert results -->
+
+  The transformer and zero-shot pipelines are defined by the `ner` and `gpt` workflows respectively.
+  In order to run the `gpt` workflow, make sure to [install Prodigy](https://prodi.gy/docs/install) as well
+  as a few additional Python dependencies:
+
+  ```
+  python -m pip install prodigy -f https://[email protected]
+  python -m pip install -r requirements.txt
+  ```
+
+  With `XXXX-XXXX-XXXX-XXXX` being your personal Prodigy license key.
+
+  Then, create a new API key from openai.com or fetch an existing one. Record the
+  secret key as well as the organization key and make sure these are available as
+  environmental variables. For instance, set them in a `.env` file in the root
+  directory:
+
+  ```
+  PRODIGY_OPENAI_ORG = "org-..."
+  PRODIGY_OPENAI_KEY = "sk-..."
+  ```
 
 directories:
   - "assets"