This repository for all big data related content I will be studying.
- BDA
- Day 2
- Day 3
- Day 4
- Day 5
- Day 6
- Day 7
- Day 8
- Day 9
- Day 10
- Day 11
- Day 12
- Day 13
- Day 14
- Day 15
- Day 16
- Day 17
- Day 18
- Day 19
- contains all pdfs -> click here
- RAID
- Java heap and Java basics
- installation and configuration of hadoop
- the 1st spreadsheet software
- definition of IoT
- Self Driving cars
- datacuration
- Linux namespaces and linux commands
- unix history
- IBM big data metrics
- throughput
- concurrency
- Notes
- hadoop commands
- mapreduce introduction
- sdlc
- check Notes
- mapper and reducer with python
- MapReduce java example for wordcount
- check Notes
- more exercises
- MapReduce for Ages
- MapReduce for Marks
- click here
- MapReduce for internet usage in usage
- MapReduce for facebook likes on video for Feb, 2018 in insight
- click here
- MapReduce for facebook average shares in 2017
- Map-Only Task
- click here for Notes
- mintemp, maxtemp, forest, funniest post
- all codes no notes -> here
- Hadoop Streaming
- installing R on linux: sudo apt install r-base-core
- Notes -> here
- https://data-flair.training/blogs/hadoop-streaming/
- https://www.glennklockwood.com/data-intensive/hadoop/streaming.html
- https://cran.r-project.org/bin/linux/ubuntu/fullREADME.html
- https://www.mongodb.com/blog/post/hadoop-streaming-support-for-mongodb
- https://www.how2shout.com/linux/how-to-connect-to-aws-ec2-instance-from-ubuntu/
- MR streaming code for python check here
- With Rscript in r_demo
- Wordcount in R
- Multi Node Setup
- Big data ecosystem
- Hive
- Notes -> here
- hive commands and operations
- python with hive
- Notes -> here
- python with hive
- built in functions of hive
- join, subquery
- Notes -> here
- NoSQL
- HBase introduction
- Notes -> here
- Hbase setup
- Hbase commands
- Notes -> here
- Spark introduction
- spark-shell practical
- Difference between RDD vs Dataframe vs Dataset
- Notes -> here
- pyspark colab
- Discussion on Data Analytics
- Notes -> here
- pyspark colab
- ML, DL from pypark-colab