The source code for this model is forked from CodeXGLUE-Repository. Some small changes are applied. However, we follow the same instructions for fine-tuning and inference.
- CodeBERT/preprocessing: Contains the preprocessing techniques applied in the Funcom dataset. Some of the techniques are chosen from previous work.
The source code for this model is forked from NeuralCodeSum-Repository. We follow the same steps for training/testing the model.
- NeuralCodeSum/preprocessing: In addition to preprocessing techniques applied in the CodeBERT, we added some other preprocessing techniques, e.g., split tokens for this model.
We used the open-source implementation from code2seq-repo
- code2seq/JavaExtractor: modified dataset-build and AST generation files. Original repo: LRNavin/AutoComments
- code2seq/preproc: dataset preprocess folder (part of AST generation) with slight modification in code2seq/preproc/feature_extractor.py
- code2seq/code2seq_commands.ipynb: Notebook containing Funcom data preprocessing steps for code2seq, study result analysis and statistical significance test for BLEU score
This folder contains all the categories selected by each of the annotators. Also, the final categories after resolving conflicts are also mentioned there for each of the samples.