Skip to content

OliverXiaoxiong/naturalLanguageProcessing

 
 

Repository files navigation

naturalLanguageProcessing

Journey to learning natural language processing.

Abstract

In times of political turmoil, often the news we see from all sources is not 100% accurate. With different biases and parties releasing their own version of news, or with tabloid news outlets like Buzzfeed, Facebook, etc, we are trying to predict the accuracy of news based on text. This is a process called natural language processing, a machine learning method that essentially teaches the computer to understand words.

Goal

Learn natural language processing. Predict the accuracy of news based on keywords/tags from the article title. One method is to differentiate between object/verb in a sentence in the title of the article and a summary of the article.

Method

Fact check the accuracy of news based on keywords/phrases. The third column includes the statements - predict how many could be "fact-checked." Try to break the statement into Subject-Verb-Object tuples and check against the data.

Utilizing the SpaCy python module, NLTK, and Scikit-learn.

To Do 4/19/2017

  • Get familiar with nlp using resources below > feel free to add your own!
  • Clean up code and explore the data
  • Remove the punctuation and stopwords from the data
  • Tokenize the words and split the summaries into tuples
  • Remove 0 1 2 3 4 5 column
  • Make github branch and clone the repo to your personal computer
  • Algorithms to use: Naive Bayes classifier, SVM

Resources

https://www.dataquest.io/blog/natural-language-processing-with-python/ https://pythonprogramming.net/naive-bayes-classifier-nltk-tutorial/?completed=/words-as-features-nltk-tutorial/ http://textminingonline.com/dive-into-nltk-part-ii-sentence-tokenize-and-word-tokenize http://victoria.lviv.ua/html/fl5/NaturalLanguageProcessingWithPython.pdf

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.3%
  • Python 2.7%