Predict tags for posts from StackOverflow with multilabel classification approach.
- Dataset of post titles from StackOverflow
- Transformed text data to numeric vectors using bag-of-words and TF-IDF.
MultiLabelBinarizer to transform labels in a binary form and the prediction will be a mask of 0s and 1s.
Logistic Regression for Multilabel classification
- Coefficient = 10
- L2-regularization technique
Results evaluated using several classification metrics:
- Numpy — a package for scientific computing.
- Pandas — a library providing high-performance, easy-to-use data structures and data analysis tools for the Python
- scikit-learn — a tool for data mining and data analysis.
- NLTK — a platform to work with natural language.