Skip to content

samuvack/big-data-solution

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BigData Solution written in Python based on Hadoop and Spark.

Scale your data management by distributing workload and storage on Hadoop and Spark Clusters, explore and transform your data in Jupyter Notebook.

LinkedIn

About The Project

Purpose for this tutorial is to show how to get started with Hadoop, Spark and Jupyter for your BigData solution, deploy as Docker Containers.

Architecture overview

Pre-requisite

  • Only confirmed working on Linux/Windows (Apple Silicon might have issues).
  • Ensure Docker is installed.

Start

Execute bash master-build.sh to start the the build and start the containers.

Hadoop

Access Hadoop UI on ' http://localhost:9870 '

Spark

Access Spark Master UI on ' http://localhost:8080 '

Jupyter

Access Jupyter UI on ' http://localhost:8888 '

Contributing

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/featureName)
  3. Commit your Changes (git commit -m 'Add some featureName')
  4. Push to the Branch (git push origin feature/featureName)
  5. Open a Pull Request

Contact

Martin Karlsson

LinkedIn : martin-karlsson
Twitter : @HelloKarlsson
Email : [email protected]
Webpage : www.martinkarlsson.io

Project Link: github.com/martinkarlssonio/big-data-solution

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 40.2%
  • Dockerfile 34.4%
  • Jupyter Notebook 22.7%
  • Python 2.7%