#Party Bias Clustering of US Legislators

Our project focuses on analyzing the language used by US legislators. We want to know how their language varies by party and state. We also want to know how their language changes over time.

Scripts are used roughly in the following order:

congressPdfPuller.pl | congress pdf massive downloader
congTextToCSV.py | Imports function from updated parsing file READMEtextToCSV.txt
listCongressNumberAndFilenameInCSV.pl | just a helper tool to extract distinct congress number and date
remove2001.py | Removes 2001 dates from congressionalRecords.csv since we're only interested in 2006+
c_processor_final.py | Preprocessor to parse congressional record PDF's into CSV's legislators-current.json.txt | biographical information for current legislators legislators-historical.json.txt | biographical information for past legislators
splitCsvByDate.pl | splits processed CSV's by date
{tfidf calculation in tf_idf folder}
tfidfDiffs.py | calculates Delta TF-IDF's
allYearsTFIDF_Diff.py | runs the tfidfDiffs.py script over all the congressional records
fixCongress82.py | Fixes mistake where congress = 82
parallel_script.sh | added extra formatting to fix some errors in filenames
gnuParallelList.txt | removed parallel delta tf-idf script and appended original.
cleanedTableData.html | added html table parser and html file to get approval disapproval scores
htmlTableParser.pl | took percent stuff off

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls