dblp_survey

Iterates over the entire DBLP database of scientific papers, creating CSV(s) of papers from the given year or newer coming from the given conferences/journals, whose title contains any of the given keywords. The CSV output contains the paper title, conference/journal, year, and a URL to the paper. Useful for scientific surveys.

Usage

Install the required Python packages by pip install -r requirements.txt
Download the DBLP database to the root directory of the repo from this link (if it's dead, let me know, going to dblp.org->XML Data->"raw dblp data in a single XML file" should work). Download both the dblp.xml.gz file (unpack to dblp.xml) and the dblp.dtd file.
Set up the list of conferences/journals the papers should be from. The script expects it at dblp_survey/inputs/conf_journ.csv (an example provided with the repo). One entry per line, the entries must exactly match the ones in the DBLP XML database. To find those, load the XML database in some text editor that is able to handle large files, search for a paper that you are sure comes from the desired conference/journal, and record what you see between the <journal> (journal papers) or <booktitle> (conference papers) tags.
Set up the list of keywords the script will be searching in the titles. The script expects it at dblp_survey/inputs/keywords.csv (an example provided with the repo). Again, one entry per line, the script performs a case-insensitive exact search. Therefore, stemming the words you are searching for is strongly recommended. For example, if you are interested in papers on evaluation, it's a good idea to use evaluat as a keyword, as that searches for evaluation, evaluate, evaluating etc.
Run the script using python dblp_survey.py <year> --split <split_mode>. <year> is a mandatory argument, it is the oldest year from which papers will be considered (e. g., 2017 will consider papers from 2017 to now). --split <split_mode> is an optional parameter with two possible values: none will not split the papers and output a single CSV at dblp_survey/outputs/dblp_survey.csv, per-venue outputs a CSV for each conference/journal you specified in the respective file. Default value is per-venue.
The output CSV(s) for each title contain the title, conference/journal, year, and a link. The links are clickable if you import the CSV to Google Sheets, should be clickable in Excel, in LibreOffice they seem not to be clickable.

Happy surveying!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
inputs		inputs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dblp_survey.py		dblp_survey.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dblp_survey

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dblp_survey

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages