OpenData

Description

The script is automatizated, so you only need to generate the main and the header xml file of your article and put it in outputs and outputs/header folder to let the script do its job.

Requirements

Gradle >= 7.0.0

maven >=3.6.0

java 11.x

python 3.8.x

Documentation

Command for running the sript is

python3 script.py

If you want to try this script, it is necessary to generate a grobid xml main and header file and put it in the corresponding folder. (It will be mentioned during Grobid generation section).

Installation & execution instructions

Grobid generation

The way I generated the grobid xml files is using grobid_client repository andgrobid server.

Command for running grobid server is:

foo@bar:~$ ./gradlew clean install

and then:

foo@bar:~$ ./gradlew run

Command for generating header file with grobid client is:

foo@bar:~$ grobid_client --input ~/Documents/OpenData/inputs --output ~/Documents/OpenData/outputs/header --n 20 processHeaderDocument

Command for generating main file with grobid client is:

foo@bar:~$ grobid_client --input ~/Documents/OpenData/inputs --output ~/tmp/out processFulltextDocument

Anaconda enviroment

Firstly you must install Anaconda and then use these commands:

foo@bar:~$ conda config --add channels conda-forge

foo@bar:~$ conda create -n opendata python=3.8 nltk wordcloud matplotlib lxml pytest

foo@bar:~$ conda activate opendata

Docker enviroment

You need Docker before all, then type these commands:

foo@bar:~$ docker build -t opendata .

foo@bar:~$ docker run  opendata

Running Examples

Running examples are in the project already.

inputs folder contains all articles used for the example, and outputs folder contains xml files generated with Grobid. figuresperArticle, listOfLinks and wordClouds folders contains the outcome of the project.

Preferred citation

If you use our code or results in your research, please cite our paper:

@article{Opendata,
  title   = {{Opendata}},
  author  = {SHENGXING LU},
  year    = {2023},
  doi     = {https://zenodo.org/badge/latestdoi/599152914}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
docs		docs
figuresperArticle		figuresperArticle
inputs		inputs
listOfLinks		listOfLinks
outputs		outputs
test		test
wordClouds		wordClouds
.python-version		.python-version
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
README.md		README.md
license		license
metadata.json		metadata.json
mkdocs.yml		mkdocs.yml
rationale.md		rationale.md
readthedocs.yml		readthedocs.yml
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenData

Description

Requirements

Documentation

Installation & execution instructions

Grobid generation

Anaconda enviroment

Docker enviroment

Running Examples

Preferred citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenData

Description

Requirements

Documentation

Installation & execution instructions

Grobid generation

Anaconda enviroment

Docker enviroment

Running Examples

Preferred citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages