Data_pipeline

Built Streaming data pipeline using kafka and spark structured streaming. FileReader class reads data from two different file formats one is psv and json files , Apply transformations and filters on dataset and resulting dataset persists to postgres DB.

Below are the steps :

• clone the repo/project

• Start zookeeper server using below command

 .\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties

• Start kafka server

.\bin\windows\kafka-server-start.bat .\config\server.properties

• intsall postgres DB into your loacl system

• Re-Build the project

• Execute FileReadertest and FileWritertest classes

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
Documentation		Documentation
Jenkins		Jenkins
project		project
src		src
target		target
testdata		testdata
.gitattributes		.gitattributes
README.md		README.md
build.sbt		build.sbt
deployment.yml		deployment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data_pipeline

About

Releases

Packages

Languages

BandiM/Data_pipeline

Folders and files

Latest commit

History

Repository files navigation

Data_pipeline

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages