In this project, aim is to categorise movies into genres by analysing Arabic subtitles with machine learning techniques.
1- cleaning data: Pre-process arabic text (remove diacritics, punctuations and repeating characters.
2- text extraction:
- remove stop words.
- stemming using Khoja Stemmer : http://zeus.cs.pacificu.edu/shereen/research.htm with command line and connect it with python.
- caculate the ferquancy of every word in each genre to use it later in claasfiation phase.
3- classification.
4- testing.
- [NumPy]
- [Pandas]
- [NLTK]
- [Matplotlib]
You will also need to have software installed to run and execute jupyter notebook.
download Arabic subtitles from http://subscene.com/ it contains 20 subtitle for each genre. you can find it in subtitles dirctory.
git clone https://github.com/Nourahussein/Movie-classfiction-pased-on-it-s-Arabic-subtitle
cd Movie-classfiction-pased-on-it-s-Arabic-subtitle