Skip to content

This problem statement given in internship project provided by iNeuron. Our goal is to classify an article into 5 categories: Spot, Politics, Tech. Business, Entertainment, that will be useful for news blogs to provide content users want by category

Notifications You must be signed in to change notification settings

ujjwalkar0/News-Article-Sorting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News Article Sorting

Abstract:

Nowadays on the Internet, there are a lot of sources that generate immense amounts of daily news. In addition, the demand for information by users has been growing continuously, so it is crucial that the news is classified to allow users to access the information of interest quickly and effectively. This work is a machine learning model that classifies news articles into 5 categories: business, entertainment, politics, sport, or tech. A labeled public dataset from the BBC comprised of 1490 articles is used for prediction with different algorithms. Almost every algorithm gives an accuracy of more than 90%. Complement Naive Bayes having good precision (more than 95%), recall (more than 98%), and f1-score (more than 97%) for every class, gives accuracy of 98% we use Complement Naive Bayes model in test purpose.

Problem Statement:

In today’s world, data is power. With News companies having terabytes of data stored in servers, everyone is in the quest to discover insights that add value to the organization. With various examples to quote in which analytics is being used to drive actions, one that stands out is news article classification. Nowadays on the Internet, there are a lot of sources that generate immense amounts of daily news. In addition, the demand for information by users has been growing continuously, so it is crucial that the news is classified to allow users to access the information of interest quickly and effectively. This way, the machine learning model for automated news classification could be used to identify topics of untracked news and/or make individual suggestions based on the user’s prior interests.

Solution:

A labeled public dataset from the BBC comprised of 1490 articles is used for prediction with different algorithms. Almost every algorithm gives an accuracy of more than 90%. Complement Naive Bayes having good precision (more than 95%), recall (more than 98%), and f1-score (more than 97%) for every class, gives an accuracy of 98% we use the Complement Naive Bayes model in the test purpose.

Dataset : BBC News Classification

I am using a public dataset from the BBC comprised of 1490 articles, each labeled under one of 5 categories: business, entertainment, politics, sport or tech.

Data fields :

  • ArticleId - Article id unique # given to the record
  • Article - text of the header and article
  • Category - category of the article (tech, business, sport, entertainment, politics/li>

Implementation:

2. Website Interface and Rest API to Use it from anywhere: https://github.com/Uncoded-AI/Website/

3. Python Module to integrate it on any python program: https://github.com/Uncoded-AI/docType

Documentation:

Summery:

Watch the video

About

This problem statement given in internship project provided by iNeuron. Our goal is to classify an article into 5 categories: Spot, Politics, Tech. Business, Entertainment, that will be useful for news blogs to provide content users want by category

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published