This project implements a simple movie recommendation system using natural language processing (NLP) techniques and machine learning. It uses the TMDb 5000 Movies and TMDb 5000 Credits datasets to generate recommendations based on movie tags.
The system processes movie metadata to create a comprehensive set of tags for each movie. These tags are then vectorized and used to calculate cosine similarity between movies. Based on this similarity, the system can recommend movies that are most similar to a given movie.
- Merges movie and credits data on the movie title.
- Processes data to extract relevant information: genres, keywords, cast, crew, and overview.
- Creates a unified set of tags for each movie.
- Uses NLP techniques to stem the tags.
- Calculates cosine similarity between movie vectors.
- Recommends movies based on similarity scores.
-
Clone the repository:
https://github.com/irvincardoza/movie-recommendation-system.git
-
Set up a virtual environment and install dependencies:
-
Ensure you have the necessary NLTK data:
import nltk nltk.download('stopwords')
Run the script to generate recommendations:
python movie.py
- To be available soon