Skip to content

WhittJS/ngram-recommender

Repository files navigation

ngram-recommender

The script takes two corpora (output_1.csv and output_2.csv for the student model and training.txt for the teacher model, output_1.csv and output_2.csv are automatically combined) of Java methods as input and automatically identifies the best-performing model based on a specific N-value. It then evaluates the selected model on the test set extracted from output_*.csv. Since the training corpus differs from both the instructor-provided dataset and our own dataset, we store the results in a file named results_[student/teacher]_model.json to distinguish them accordingly.

Installation:

  1. Install python 3.9+ locally
  2. Clone the repository to your workspace:
~ $ git clone https://github.com/WhittJS/ngram-recommender.git
  1. Navigate into the repository:
~ $ cd ngram-recommender
~/ngram-recommender $
  1. Set up a virtual environment and activate it:
~/ngram-recommender $ python -m venv ./venv/

For macOS/Linux:

~/ngram-recommender $ source venv/bin/activate
(venv) ~/ngram-recommender $ 

For Windows:

~\ngram-recommender $ .\venv\Scripts\activate.bat
(venv) ~\ngram-recommender $ 
  1. To install the required packages:
(venv) ~/ngram-recommender $ pip install -r requirements.txt

Running the Program

  1. Generate new JSON files based on student_model.pkl and teacher_model.pkl:
python ngram_recommender.py
  1. To retrain either model, delete the file of the one you want to train and rerun the above command.
    • Edit the min_ngram and max_ngram values in the train_test_model function to train on ngrams within specified parameters.

Report

The assignment report is available in the file Assignment_Report.pdf.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages