Skip to content

Research Project Repo on How Datasets are Cited @ PURRlab, ITU Copenhagen

Notifications You must be signed in to change notification settings

purrlab/PublicDatasets-MIDL

 
 

Repository files navigation

PublicDatasets

Research Project Repo on How Datasets are Cited @ PURRlab, ITU Copenhagen

Install with pip3 install -r requirements.txt.

This project allows you to build datasets of dataset mentions from papers published in https://proceedings.mlr.press/.

Code

  • ArticleOrganizer.ipynb : Runs first, selects target venues and downloads their contents locally. Generates ResearchPapers.csv.
  • ArticleAnalayzer.ipynb : Runs second, requires some configuration to know where in the text to look for research paper mentions. Generates DatasetMentions_Unprocessed.csv, a table which may be further annotated to include Dataset Identifier and Access.
  • ArticleVisualizer.ipynb : Run after cleaning and annotating your unprocessed file to generate visualizations.

Data:

  • data/ResearchPapers.csv : A table of research papers which have been downloaded and their respective venues.
  • data/DatasetMentions_Unprocessed.csv : A table of research papers which have been downloaded and their respective venues. Dataset mentions are sorted by the paper and venue they occur in. The Mention Style and Mention column indicate the type of mention and how it occurs in the text. The Notes column is used to indicate the original context so that an annotator may validate and make corrections if necessary.
  • data/DatasetMentions_Processed.csv : A table of which has been manually annotated over DatasetMentions_Unprocessed. Redudant columns were merged and footnotes were replaced with URLs instead of numbers. The example used in this repository introduces the Dataset Identifier and Access columns for the ArticleVisualizer.ipynb visualizer.

About

Research Project Repo on How Datasets are Cited @ PURRlab, ITU Copenhagen

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.1%
  • Python 1.5%
  • Rich Text Format 0.4%