dataset-scripts: scripts for dataset creation and cleaning
training-dataset-scripts: scripts for tokenization and abstraction
fairseq-scripts: scripts for running model training with fairseq on Agave cluster
placeholder: python egg for tokenization