warmup_exercise_Hadoop_MapReduce

A simple example of Hadoop MapReduce in Python.

Adapted from here.

If you want to test that the mapper is working, you can do something like this:

python mapper.py < nchs_causes_of_deaths.csv | tail

This takes the file shakespeare.txt as input for mapper.py and shows the last few lines of output.

Similarly, you can see if the reducer is working like so:

python mapper.py < nchs_causes_of_deaths.csv > map_output.txt python reducer.py < map_output.txt

This creates the file map_output.txt by processing shakespeare.txt with mapper.py, and then uses reducer.py to process the map_output.txt file.

The dataset provided, nchs_causes_of_deaths.csv, was adapted from this dataset.

Testing the files in this way is much easier than trying to debug the errors from Hadoop streaming. The errors from using Python above will be Python errors and easier to read than the complex Hadoop Java errors.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE.md		LICENSE.md
README.md		README.md
mapper.py		mapper.py
nchs_causes_of_deaths.csv		nchs_causes_of_deaths.csv
reducer.py		reducer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

warmup_exercise_Hadoop_MapReduce

About

Releases

Packages

Languages

Regis-University-Data-Science/warmup_exercise_Hadoop_MapReduce

Folders and files

Latest commit

History

Repository files navigation

warmup_exercise_Hadoop_MapReduce

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages