Skip to content

lx0612/etl-effects-of-covid-19-on-trade-at-15-december-2021-provisional

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

etl-effects-of-covid-19-on-trade-at-15-december-2021-provisional

file spark-lamdo used to build and config cluster apche spark with one master and two worker

the version of spark : 3.2.1

file include two file apps and data

Job.py file in /apps is written based on pyspark for the purpose of ETL (extract, transform and load data) into Postgresql.

file csv and jar in /data

#To run:

      docker build -t cluster-apache-spark:3.2.1 .


      docker compose up -d

#To submit the app connect to one of the workers or the master and execute:

      /opt/spark/bin/spark-submit --master spark://spark-master:7077 \
      --jars /opt/spark-data/postgresql-42.2.22.jar \
      --driver-memory 1G \
      --executor-memory 1G \
      /opt/spark-apps/Job.py

#submit spark_3

#run spark_4

#succes spark_5

Check the completion time of spark-cluster.

About

Use apache spark to read, tranform and load to postgresql

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published