Which US Democratic Presidential Nominee said this? Warren? Biden? Sanders?

Text Classification of quotes from candidates vying to be the Democratic presidential nominee for the 2020 US presidential election.

Here, all data has been extracted from debates between candidates. I have built a NLP classification model to identify who said what for a subset of unlabeled data.

Methodology

The quotes are subjected to basic text-preprocessing steps such as

Stopword removal
Punctuation removal
Lemmatization
Tokenization using unigram

To prepare data for modeling, I performed feature engineering. Here, I engineered features which utilize count of various components of the text such as character, word, punctuation etc.

The text classification is done using Supervised & Semi-Supervised techniques. The following models were explored:

Regularized Logistic Regression
Random Forest
XGBoost

Tools & Technology

1. NLP: nltk, TfidfVectorizer, CountVectorizer
2. ML: sklearn, xgboost, scipy
3. Visualization: Seaborn, Matplotlib
4. Exploration: Jupyter Notebooks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Which US Democratic Presidential Nominee said this? Warren? Biden? Sanders?

Text Classification of quotes from candidates vying to be the Democratic presidential nominee for the 2020 US presidential election.

Methodology

Tools & Technology

Files

README.md

Latest commit

History

README.md

File metadata and controls

Which US Democratic Presidential Nominee said this? Warren? Biden? Sanders?

Text Classification of quotes from candidates vying to be the Democratic presidential nominee for the 2020 US presidential election.

Methodology

Tools & Technology