SummaryTweets

##Authors Irene Feng, Orestis Lykouropoulos, Patrick Xu

##Overview SummaryTweets is python-based sentence compression program which primarily uses TF-IDF an phrase substitution to reduce the length of a given text to the desired length. SummaryTweets has a default output length of 140 characters, the length of one Tweet.

We use TF-IDF, or Term Frequency-Inverse Document Frequency, as a way to determine the importance of terms in a sentence, which we then sum within each sentence to build a sentence score. In general, TF-IDF is based upon the idea that, given a large enough sample text, the frequency of a word is inversely related to its importance. Using word counts from a large collection of sample texts, a corpus, we can assign each word a score which reflects its importance. Using these scores, we can determine the importance of a sentence. TF-IDF is not a perfect scoring method but has nonetheless proved quite accurate in our application.

Phrasal substitution is achieved through information from the Paraphrase Database (PPDB). Over 30,000 lexical rules are currently utilized by SummaryTweets, although there exists the possibility of utilizing far more at the trade off of substitution accuracy.

##How to Use SummaryTweets has been uploaded to the following webpage.

http://www.cs.dartmouth.edu/cgi-bin/cgiwrap/patxu/summary.cgi

It can also be used by running the bash shell script file run.sh. This runs tf_idf.py using approriate input arguments. The python file itself can be run with

python tfidf

Use the flag "-h" for information about the input arguments.

##Structure /CorpusFolder- contains the Brown Corpus and arpa bigram probabilities

/pickl- contains the serialized dictionaries for our corpus. These dictionaries are used for sentence compression and TF-IDF

/stat_parser- uses the CKY algorithm to return a parse tree of a sentence

/styles- files for marking up the webpage

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
CorpusFolder		CorpusFolder
pickl		pickl
stat_parser		stat_parser
styles		styles
README.md		README.md
input.txt		input.txt
parse_compress.py		parse_compress.py
parse_compress.pyc		parse_compress.pyc
picklephrase.py		picklephrase.py
picklme.py		picklme.py
run.sh		run.sh
summary.cgi		summary.cgi
testfile.txt		testfile.txt
tf_idf.py		tf_idf.py
tf_idf.pyc		tf_idf.pyc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SummaryTweets

About

Releases

Packages

Contributors 3

Languages

irenelfeng/SummaryTweets

Folders and files

Latest commit

History

Repository files navigation

SummaryTweets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages