Getting News - Kafka Real Time Data Engineering Project

Introduction

In this project, I execute an end-to-end data engineering project on real-time API global news using Kafka.

For this I use different technologies such as Python, Linux, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, SQL and API consumption.

Architecture

Technologies

Rest API
Programming Language - Python
Linux (infra settings)
Amazon Web Service (AWS)
- S3 (Simple Storage Service)
- Athena
- Glue Crawler
- Glue Catalog
- EC2
Apache Kafka

Dataset

It comes from XXX API. Documentation of bellow:

Pipeline

The data is:

gotted from API service
transformed using python script
streamed using kafka over an EC2 instance in AWS service
saved in S3 Amazon data store
crawled and cataloged using Amazon Glue
delivered as structered data to be queried using Amazon Athena.

I believe this pipeline could be used as a basis to solve problems in many companies and deliver information to data or business teams.

Sources

This flow was based on Darshil Parmar video: https://www.youtube.com/watch?v=KerNf0NANMo&t=1148s

I invite you to subscribe to his channel.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting News - Kafka Real Time Data Engineering Project

Introduction

Architecture

Technologies

Dataset

Pipeline

Sources

About

Releases

Packages

License

wesleyssantos27/kafka-streaming-project

Folders and files

Latest commit

History

Repository files navigation

Getting News - Kafka Real Time Data Engineering Project

Introduction

Architecture

Technologies

Dataset

Pipeline

Sources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages