🎬 Project by TomMakesThings, rogerchenrc and laviniafr 🎬
This is a natural language processing (NLP) group project in which we tested different NLP techniques and model architectures to create a CI/CD pipeline to train and deploy a multi-label classifier. The classifier was trained on dataset of movie descriptions to predict the top fitting genre(s) with 12 possible values including: Drama, Comedy, Action, Crime, Thriller, Romance, Horror, Adventure, Mystery, Family, Fantasy and Sci-Fi. The state of the best trained model was then saved to file and deployed on a custom built web server. For more information, see our GitHub pages site.
Conda environment:
To ensure all team members could execute the code during development, it was created using a conda environment. This environment has been saved as a YAML file, environment.yml, and is included in the repository. To recreate this environment:
- Download the code in the main repository from Code ⇨ Download ZIP
- Extract the contents of the zip
- Open the Anaconda prompt and navigate to the folder of the extracted code, e.g.
cd Downloads/Movie-Genre-Predictor
- Enter
conda env create -f environment.yml
, where environment.yml is the file path of the enviroment file
To run the classifier:
- From the Anaconda prompt, run
python Web_App/flaskr/main.py
to run the web application
To run the Jupyter notebooks:
- From the Anaconda prompt, run
jupyter notebook
- Navigate to the notebook