Skip to content

UIUC-Chatbot/data-generator

Repository files navigation

data-generator

input_data

  • audio_transcripts
    • Transcripts of the ECE 120 course
  • gpt-3
    • Question-Answer pairs generated by GPT-3
  • patel_textbook_split
    • The patel textbook split into 200 words with 50 overlapping words on each side
  • split_textbook
    • The ECE 120 textbook split into sentences, paragraphs, and sections

raw_data

  • notes
    • The ECE 120 textbook
  • patel_textbook
    • The Patel textbook

HF

Models used from HuggingFace.co

  • huggingface.ipynb
    • end2end.json - ThomasSimonini/t5-end2end-question-generation
    • fine_tuned_data.json - mrm8488/t5-base-finetuned-question-generation-ap
    • Dataset Quality - Not Good

bart

Inspired from this paper's second best model using bart question generator and bert-mrc

  • bart.ipynb
  • Two different bert-mrc
    • bespin.json
    • ainze.json
    • Dataset Quality - Not Good

filtering

  • cross_encoder.ipynb
    • Uses this model on HuggingFace.
  • scoring.ipynb
    • Uses bleu scores
  • Dialogue_RPT_Scoring.ipynb
    • Uses Dialogue RPT to rate the answers based on the Context of the GPT answer file

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •