Skip to content

datapipe build using spark structured streaming and kafka and persisting data into PostgreSQL database

Notifications You must be signed in to change notification settings

BandiM/Data_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data_pipeline

Built Streaming data pipeline using kafka and spark structured streaming. FileReader class reads data from two different file formats one is psv and json files , Apply transformations and filters on dataset and resulting dataset persists to postgres DB.

Below are the steps :

• clone the repo/project

• Start zookeeper server using below command

 .\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties 

• Start kafka server

.\bin\windows\kafka-server-start.bat .\config\server.properties

• intsall postgres DB into your loacl system

• Re-Build the project

• Execute FileReadertest and FileWritertest classes

About

datapipe build using spark structured streaming and kafka and persisting data into PostgreSQL database

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published