Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 716 Bytes

README.md

File metadata and controls

10 lines (6 loc) · 716 Bytes

Twitter-Streams

Course Final Project for CSCI 6502 Big Data Analytics: Systems, Algorithms, and Applications

This project uses the Twitter Streaming API and Google Cloud to build an end to end pipeline for processing a twitter stream.

(twitter stream) streams_producer.py -> Google Pub Sub Topic -> (apache Beam pipeline) streaming_tweet.py -> Google Big Query Dataset -> Google Data Studio

The necessary setup shell scripts are included for staging, deploying, and starting a Google Cloud Data Flow job that automatically provisions a subscription when given a topic to connect to.

This pipeline as implemented in streaming_tweet.py serves as a template for streaming twitter data into Google Cloud Big Query