Skip to content

Course Final Project for CSCI 6502 Big Data Analytics: Systems, Algorithms, and Applications

Notifications You must be signed in to change notification settings

LincolnRychecky/Twitter-Streams-GCP-Dataflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Twitter-Streams

Course Final Project for CSCI 6502 Big Data Analytics: Systems, Algorithms, and Applications

This project uses the Twitter Streaming API and Google Cloud to build an end to end pipeline for processing a twitter stream.

(twitter stream) streams_producer.py -> Google Pub Sub Topic -> (apache Beam pipeline) streaming_tweet.py -> Google Big Query Dataset -> Google Data Studio

The necessary setup shell scripts are included for staging, deploying, and starting a Google Cloud Data Flow job that automatically provisions a subscription when given a topic to connect to.

This pipeline as implemented in streaming_tweet.py serves as a template for streaming twitter data into Google Cloud Big Query

About

Course Final Project for CSCI 6502 Big Data Analytics: Systems, Algorithms, and Applications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published