Skip to content

Latest commit



138 lines (118 loc) · 4.2 KB


File metadata and controls

138 lines (118 loc) · 4.2 KB

Short Description

  • This is a RAILS 2.3.7 based application that helps you to collect Twitter data.

How to Install

Install RVM

  • Install rvm if you are not using it (
     $ bash -s stable < <(curl -s
     $ source ~/.bash_profile
     $ rvm requirements
  • The rvm spec file is already in the repo
  • Install ruby 1.8.7
      $ rvm install 1.8.7
  • Create your gemset
     rvm gemset create 'socializer' 

    *If you have permission problems try to create the gem dir:
    sudo mkdir ~/.gem/specs
    sudo chmod 777 ~/.gem/specs

Setup Files

  • To get it running you will need to create a:
    • twitter.yml that contains your twitter credentials
    • bitly.yml that contains your bitly credentials
    • database.yml containting the database credentials
    • see twitter.example.yml or bitly.example.yml for details
    • Make sure when your Twitter account is NOT whitelisted that you dont use up your API limitations when using too many workers
    • Create the directory “/data” under your rails root to store the lists

Install Gems Dependencies

  • The app is using 2.3.7 rails so all gems are chosen to match that framework
    gem install rails -v =2.3.7
    gem update --system 1.5.3
  • To install them first install:
    gem install rails_gem_install
    RAILS_ENV=development rails_gem_install


  • Test if the application works correctly
  • You will need rspec/rspec-rails and factory girl to test it.
  • You will need to start solr in test mode
    RAILS_ENV=test rake sunspot:solr:start
    spec spec

Get Delayed Jobs working

  • Create the necessary files with: script/generate delayed_job
  • To start collecting persons or feeds you need to start a couple of delayed job workers. To do so use the script
    "./script/delayed_job -n 4"
    • The Benchmarks I measured are depending on the number of workers (n):
      • Collecting Tweets: n 4: 40.000 tweets in 10min
      • n 8: 90.000 tweets in 10min
      • n 16: 180.000 tweets in 10 min (70% CPU usage)

Start Solr and Webserver

  • All of the tweets are indexed by a lucene solr server in the background
  • It uses sunspot and solr gems.
  • Before starting the server make sure to start solr.
    rake sunspot:solr:start 

Dumping the DB and restoring it

  • In order to exchange your results it contains a rake task that dumps the existing DB into /dump
  • It uses the dump plugin for Rails 2.3
  • There is a small example db in dump containing 57 persons in one project and ~ 100K Tweets inc. Retweets
  • You can use it to experiment on the data
    rake dump
    rake dump:restore # to restore a db


It does the following:

It uses Delayed Jobs to get the collection done.
The Twitter API is wrapped using grackle and twitter gems

Persons are organized in projects that contain a set of people

collect one person
collect multiple persons based on a csv import
collect the egonetwork of a given person
show all people
show statistics of the people collected (friends, follower distributions, origin etc..)

Connections between persons
Connections between persons are stored not in the DB but on the HD in a PStore

collect the tweets of a person
collect the tweets of all persons
collects tweets based on a csv list
collect all retweets of all collected tweets
export all tweets into a csv
show statistics on the tweets (links used, keywords, timeline)

export the friendship network of the collected persons in a project the formats:
export the retweet networks of persons
export the @ networks between persons
export the person stats
export the twitter links of persons

It has some onboard scrapers under tasks that scrape the following websites

It can compute some sentiment for german tweets