The script is automatizated, so you only need to generate the main and the header xml file of your article and put it in outputs and outputs/header folder to let the script do its job.
Gradle >= 7.0.0
maven >=3.6.0
java 11.x
python 3.8.x
Command for running the sript is
python3 script.py
If you want to try this script, it is necessary to generate a grobid xml main and header file and put it in the corresponding folder. (It will be mentioned during Grobid generation section).
The way I generated the grobid xml files is using grobid_client repository andgrobid server.
Command for running grobid server is:
foo@bar:~$ ./gradlew clean install
and then:
foo@bar:~$ ./gradlew run
Command for generating header file with grobid client is:
foo@bar:~$ grobid_client --input ~/Documents/OpenData/inputs --output ~/Documents/OpenData/outputs/header --n 20 processHeaderDocument
Command for generating main file with grobid client is:
foo@bar:~$ grobid_client --input ~/Documents/OpenData/inputs --output ~/tmp/out processFulltextDocument
Firstly you must install Anaconda and then use these commands:
foo@bar:~$ conda config --add channels conda-forge
foo@bar:~$ conda create -n opendata python=3.8 nltk wordcloud matplotlib lxml pytest
foo@bar:~$ conda activate opendata
You need Docker before all, then type these commands:
foo@bar:~$ docker build -t opendata .
foo@bar:~$ docker run opendata
Running examples are in the project already.
inputs folder contains all articles used for the example, and outputs folder contains xml files generated with Grobid.
figuresperArticle, listOfLinks and wordClouds folders contains the outcome of the project.
If you use our code or results in your research, please cite our paper:
@article{Opendata,
title = {{Opendata}},
author = {SHENGXING LU},
year = {2023},
doi = {https://zenodo.org/badge/latestdoi/599152914}
}