naturalLanguageProcessing

Journey to learning natural language processing.

Abstract

In times of political turmoil, often the news we see from all sources is not 100% accurate. With different biases and parties releasing their own version of news, or with tabloid news outlets like Buzzfeed, Facebook, etc, we are trying to predict the accuracy of news based on text. This is a process called natural language processing, a machine learning method that essentially teaches the computer to understand words.

Goal

Learn natural language processing. Predict the accuracy of news based on keywords/tags from the article title. One method is to differentiate between object/verb in a sentence in the title of the article and a summary of the article.

Method

Fact check the accuracy of news based on keywords/phrases. The third column includes the statements - predict how many could be "fact-checked." Try to break the statement into Subject-Verb-Object tuples and check against the data.

Utilizing the SpaCy python module, NLTK, and Scikit-learn.

To Do 4/19/2017

Get familiar with nlp using resources below > feel free to add your own!
Clean up code and explore the data
Remove the punctuation and stopwords from the data
Tokenize the words and split the summaries into tuples
Remove 0 1 2 3 4 5 column
Make github branch and clone the repo to your personal computer
Algorithms to use: Naive Bayes classifier, SVM

Resources

https://www.dataquest.io/blog/natural-language-processing-with-python/ https://pythonprogramming.net/naive-bayes-classifier-nltk-tutorial/?completed=/words-as-features-nltk-tutorial/ http://textminingonline.com/dive-into-nltk-part-ii-sentence-tokenize-and-word-tokenize http://victoria.lviv.ua/html/fl5/NaturalLanguageProcessingWithPython.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

naturalLanguageProcessing

Abstract

Goal

Method

To Do 4/19/2017

Resources

Files

README.md

Latest commit

History

README.md

File metadata and controls

naturalLanguageProcessing

Abstract

Goal

Method

To Do 4/19/2017

Resources