Skip to content

Commit f53a0a6

Browse files
author
Harsh Pandya
committed
Update with grafana instead of quicksight
1 parent bf79b9d commit f53a0a6

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Here, each Kinesis data stream consists of at most 100 tweets. Once the data is
1414

1515
AWS Redshift is a fully managed datawarehouse system where tables are made using SQL commands. These tables would hold the transactional and aggregated data stored into the bucket. To load the data stored in the s3 buckets to the Redshift datawarehouse, COPY commands are used. A connection is made to the Redshift Cluster (SQL Workbench or Redshift cluster query editor) and COPY commands are performed over the cluster to pull data from the buckets and into the tables.
1616

17-
AWS Quicksight is serverless, scalable and fully managed BI service tool that is used for visualization.
17+
Grafana over on an EC2 instance is used for visualization.
1818

1919
### System Architecture
2020

@@ -40,6 +40,8 @@ AWS Kinesis works as a temporary storage mechanism for faster retrieval for furt
4040

4141
EMR provides a fleet of high power EC2 Instances with a highly in used distributed processing frame work like Haddop Spark. It has the capacity to perform data processing on Terabyte or Petabytes of data. EMR writes the data to S3 buckets rather than directly writing it to Redshift for several reasons. There can be number of different sub-systems which would like to consume the processed and aggregated data. With S3 storage is extremely cheaper than Redshift, where we pay for space by hour. Moreover, S3 read/writes are cheaper than Redshift reads where we pay for each request and its data packet size. Redshift's primary goal is to provide a big picture of the data and be able to query historical data faster. Redshift's data querying is much faster than S3. Hence, S3 is used to leverage cost when this system may be a part of a bigger architecture with many microservices.
4242

43+
Once the data is loaded into the Redshift databases, Data Visualization systems like Grafana can pull the data and visualize it.
44+
4345
#### Steps
4446

4547
1. Run consumer.py: It will not show any output yet since we have not started fetching tweets from twitter api yet.

0 commit comments

Comments
 (0)