Nowadays on the Internet, there are a lot of sources that generate immense amounts of daily news. In addition, the demand for information by users has been growing continuously, so it is crucial that the news is classified to allow users to access the information of interest quickly and effectively. This work is a machine learning model that classifies news articles into 5 categories: business, entertainment, politics, sport, or tech. A labeled public dataset from the BBC comprised of 1490 articles is used for prediction with different algorithms. Almost every algorithm gives an accuracy of more than 90%. Complement Naive Bayes having good precision (more than 95%), recall (more than 98%), and f1-score (more than 97%) for every class, gives accuracy of 98% we use Complement Naive Bayes model in test purpose.
In today’s world, data is power. With News companies having terabytes of data stored in servers, everyone is in the quest to discover insights that add value to the organization. With various examples to quote in which analytics is being used to drive actions, one that stands out is news article classification. Nowadays on the Internet, there are a lot of sources that generate immense amounts of daily news. In addition, the demand for information by users has been growing continuously, so it is crucial that the news is classified to allow users to access the information of interest quickly and effectively. This way, the machine learning model for automated news classification could be used to identify topics of untracked news and/or make individual suggestions based on the user’s prior interests.
A labeled public dataset from the BBC comprised of 1490 articles is used for prediction with different algorithms. Almost every algorithm gives an accuracy of more than 90%. Complement Naive Bayes having good precision (more than 95%), recall (more than 98%), and f1-score (more than 97%) for every class, gives an accuracy of 98% we use the Complement Naive Bayes model in the test purpose.
I am using a public dataset from the BBC comprised of 1490 articles, each labeled under one of 5 categories: business, entertainment, politics, sport or tech.
Dataset URL : https://www.kaggle.com/c/learn-ai-bbc/data
- ArticleId - Article id unique # given to the record
- Article - text of the header and article
- Category - category of the article (tech, business, sport, entertainment, politics/li>