In this project, I execute an end-to-end data engineering project on real-time API global news using Kafka.
For this I use different technologies such as Python, Linux, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, SQL and API consumption.
-
Rest API
-
Programming Language - Python
-
Linux (infra settings)
-
Amazon Web Service (AWS)
-
S3 (Simple Storage Service)
-
Athena
-
Glue Crawler
-
Glue Catalog
-
EC2
-
-
Apache Kafka
It comes from XXX API. Documentation of bellow:
The data is:
- gotted from API service
- transformed using python script
- streamed using kafka over an EC2 instance in AWS service
- saved in S3 Amazon data store
- crawled and cataloged using Amazon Glue
- delivered as structered data to be queried using Amazon Athena.
I believe this pipeline could be used as a basis to solve problems in many companies and deliver information to data or business teams.
This flow was based on Darshil Parmar video: https://www.youtube.com/watch?v=KerNf0NANMo&t=1148s
I invite you to subscribe to his channel.