Automated Multi-task Learning

Automated MTL supports two generalized multi-tasking, and recurrent deep learning architectures. Automated MTL uses the statistical regularities within the original dataset itself to reinforce the representations learned for the primary task. Automated MTL comes in two flavors: the CRNN (Cascaded Recurrent Neural Network) and the MRNN (Multi-tasking Recurrent Neural Network).

The automated MTL architectures have achieved state-of-the-art performance in sentiment analysis, topic prediction, and hashtag recommendation using a diverse set of text corpuses including Twitter, Rotten Tomatoes, and IMDB.

The Infinite Data Pipeline (∞DP):

A side project of automated MTL resulted in the Infinite Data Pipeline which is built on Java, Apache Storm, Kafka, and the Twitter API. The Infinite Data Pipeline streams and preprocesses Twitter data online and directly injects the streamed data into a running Tensorflow topology.

Requirements:

CUDNN (tested on cuDNN 5105)
CUDA Drivers + NVIDIA Graphics Card with 5.0+ support (tested on GTX 1080)
Apache Zookeeper (tested on version 3.4.6)
Apache Storm (tested on version 0.9.5)
Twitter API + Developer Credentials (tested on version 4.0.4)
Theano (tested on version 0.8.2)
Keras (tested on latest version as of January 9, 2017)
Linux Based OS (tested on Ubuntu 16.04LTS)

Install Guide:

Install CUDA and cuDNN
Apache Storm and Twitter API Setup
Install keras and Theano
Download Kafka 2.10

Data Miner Run Guide (MacOSX Local):

Run systemStartMac.sh to start your Storm instance. Make sure KAFKAHOME is set correctly in scripts/startKafkaServer.sh.
Edit src/storm/pom.xml with the appropriate Twitter credentials. Run mvn install inside src/storm to compile and mvn exec:java to start the data collection and streaming.

Data Miner Run Guide (Ubuntu 16.04 Local):

Run systemStartUbuntu.sh to start your Storm instance.
Run runAPI.sh to open the Twitter stream and start collection. (Requires you to edit runAPI.sh with correct Twitter API credentials).

Tweetnet Run Guide:

Run tweetnet.py.

Notes:

Note: The system start script opens five new terminals; Apache Zookeeper, the Nimbus, the Supervisor, StormUI, and the Kafka server. Each new open terminal requires sudo access and will request for the user's password. To view StormUI you can navigate to localhost:8080.

Note: In the CUDA setup, the section where you link cuda to cuda-7.5 is outdated.

Intead of following this step:

export CUDA_HOME=/usr/local/cuda-7.5

Make sure you using and linking CUDA v8.0:

export CUDA_HOME=/usr/local/cuda-8.0

Note: You will need to register for Twitter Developer credentials to run the data miner.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Automated Multi-task Learning

The Infinite Data Pipeline (∞DP):

Requirements:

Install Guide:

Data Miner Run Guide (MacOSX Local):

Data Miner Run Guide (Ubuntu 16.04 Local):

Tweetnet Run Guide:

Notes:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Automated Multi-task Learning

The Infinite Data Pipeline (∞DP):

Requirements:

Install Guide:

Data Miner Run Guide (MacOSX Local):

Data Miner Run Guide (Ubuntu 16.04 Local):

Tweetnet Run Guide:

Notes: