The script takes two corpora (output_1.csv and output_2.csv for the student model and training.txt for the teacher model, output_1.csv and output_2.csv are automatically combined) of Java methods as input and automatically identifies the best-performing model based on a specific N-value. It then evaluates the selected model on the test set extracted from output_*.csv.
Since the training corpus differs from both the instructor-provided dataset and our own dataset, we store the results in a file named results_[student/teacher]_model.json to distinguish them accordingly.
- Install python 3.9+ locally
- Clone the repository to your workspace:
~ $ git clone https://github.com/WhittJS/ngram-recommender.git- Navigate into the repository:
~ $ cd ngram-recommender
~/ngram-recommender $- Set up a virtual environment and activate it:
~/ngram-recommender $ python -m venv ./venv/For macOS/Linux:
~/ngram-recommender $ source venv/bin/activate
(venv) ~/ngram-recommender $ For Windows:
~\ngram-recommender $ .\venv\Scripts\activate.bat
(venv) ~\ngram-recommender $ - To install the required packages:
(venv) ~/ngram-recommender $ pip install -r requirements.txt- Generate new JSON files based on
student_model.pklandteacher_model.pkl:
python ngram_recommender.py- To retrain either model, delete the file of the one you want to train and rerun the above command.
- Edit the
min_ngramandmax_ngramvalues in thetrain_test_modelfunction to train on ngrams within specified parameters.
- Edit the
The assignment report is available in the file Assignment_Report.pdf.