-
Notifications
You must be signed in to change notification settings - Fork 0
/
hadoopmultinode.txt
218 lines (154 loc) · 8.07 KB
/
hadoopmultinode.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
##-------------------------------------------------------------------------------------------------------
## Instructions on how to install HADOOP in a multi-node cluster (>1 nodes).
## Prepared by Benson Mwangi ([email protected])
## Any references used in these notes are given at the bottom of this note
## (NB: Actual code is tabbed)
##------------------------------------------------------------------------------------------------------
Download hadoop (2.7.2) and unzip or untar.
wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.2/hadoop-2.7.2.tar.gz
tar -zxvf hadoop-2.7.2.tar.gz
Copy/move the hadoop folder onto your installation folder (e.g. /usr/local)
mv hadoop-2.7.2 /usr/local/hadoop
Update your operating system and install secure shell (ssh) server ( Assuming Fedora here)
yum update
yum install openssh-server
Not - make sure you remove "openvpn", otherwise you get this error "ssh: connect to host port 22: no route to host"
yum remove openvpn
Hadoop does not support IPv6 and therefore should be disabled and switch to IPv4.
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
Generate and enable ssh key to localmachine ( all network machines - master and slaves)
The difference between master and slaves is explored in details elsewhere (see links below)
In addition copy keys to across the hadoop cluster allow "key less" ssh login among computers.
cd
ssh-keygen -t rsa -P ""
cat ./.ssh/id_rsa.pub >> ./.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
Copy keys from master to slave and vice versa
user@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@slave #master-to-slave ( repeat for slave-to-master)
Setup Hadoop environment variables on the master machine and all slave nodes
Copy the following onto the profile (i.e. ~/.bashrc ) file ( This assumes you're logged in as user "hadoop").
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk # This assumes openjdk java 1.8.0
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HADFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH # This was added as I was getting a weird error without it.
Edit the hadoop environment file (/hadoop/etc/hadoop/hadoop-env.sh) and add $JAVA_HOME ( add the following statement at the beginning of the file).please not that this step has to be performed for both master and slave nodes.
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
Configure master and slave ipaddresses by editing /etc/hosts and adding master and slave ip addresses
This assumes there are two computers one acting as a master and the oter as a slave. You can add more slaves as necessary.
This configuration has to be done for both master and slaves. (This has to be done on both machines i.e. master and slave). Change IP addresses accordingly
gedit /etc/hosts/
192.168.1.5 master
192.168.1.13 slave
Add or Update Information regarding slave nodes on the ***MASTER NODE ONLY!!***
Note: Assumes (user) is the hadoop user or whenever hadoop was installed.
gedit /usr/local/hadoop/etc/hadoop/slaves # Add the following on the slave text document.
master
slave
##----HADOOP CONFIGURATION FILES BEGIN HERE (Please pay extra ATTENTION)!-------------##
EDIT core-site.xml (/usr/local/hadoop/etc/hadoop/core-site.xml ). Place edits between <configuration> </configuration>
(NOTE:- EDIT THIS FILE ON BOTH MASTER AND SLAVE NODES!!)
(NOTE:- THE HADOOP DATA FOLDER HAS TO BE CREATED ON ALL NODES e.g. mkdir /home/user/tmp # see 3rd line below).
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
<description>A temporary directory.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:54310</value>
<description>Instructions to use HDFS for file storage</description>
</property>
Edit mapred-site.xml (/home/user/hadoop/etc/hadoop/mapred-site.xml). As above place these edits between configurations.
(NOTE:- EDIT THIS FILE ON MASTER NODE ONLY!!)
<property>
<name>mapreduce.jobtracker.address</name>
<value>master:54311</value>
<description>Mapreduce host and jobtracker</description>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Mapreduce Jobs Framework.</description>
</property>
Edit hdfs-site.xml (/usr/local/hadoop/etc/hadoop/hdfs-site.xml).
(NOTE:- EDIT THIS FILE ON BOTH MASTER AND SLAVE NODES!!)
(NOTE:- Value 2 on line 3 describes the number of file replications to be performed.)
(NOTE:- Remember to change namenode dir (e.g. <value>/home/hadoop/hadoop-data</value> ) )
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication. Namenode - it's the master and only stores the metadata of HDFS.</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/hadoop-data</value>
<description>Determines where on the local filesystem the DFS name node should store the name table(fsimage).</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/hadoop-data</value>
<description>Determines where on the local filesystem an DFS data node should store its blocks.Data node - Stores data in the Hadoop file system</description>
</property>
Edit yarn-site.xml (/usr/local/hadoop/etc/hadoop/yarn-site.xml)
(NOTE:- EDIT THIS FILE ON MASTER and SLAVE NODES)
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
Format NAMENODE before starting the cluster using the following command (Type the following command)
hdfs namenode -format
Start/Stop the cluster using the following command ( assuming all went well with ~/.bashrc environment variables above).
start-dfs.sh
stop-dfs.sh
Start/Stop Yarn using the following command ( Assuming similar as above)
start-yarn.sh
stop-yarn.sh
To check if everything went well ( type the following commond)
NOTE: The output should list(NameNode, SecondaryNameNode, DataNode (on master node) and DataNode (on all slave nodes))
NOTE: As Yarn is also running you should see the following (NodeManager, ResourceManager on master node and NodeManager on slave nodes)
jps
The following Web UIs should give access to visually assess HDFS and YARN services on the browser.
Name Node:master:50070
YARN Services:master:8088
Secondary Name Node:master:50090
Data Node 1: master:50075
Data Node 2: master:50075
Or use the following code
hdfs dfsadmin -report
NOTE: If all services were successfully started in all nodes they should be visible here
http://master:8088/cluster/nodes
The following example should execute properly if the installation was successful
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 30 100
#---------END---------------------------------------------------------------------------------------------------------------------------------
#---------REFERENCES--------------------------------------------------------------------------------------------------------------------------
1) https://www.researchgate.net/post/how_to_configure_hadoop_271_ubuntu_1404_multinode_cluster
2) http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
3) https://mfaizmzaki.com/2015/12/17/how-to-install-hadoop-2-7-1-multi-node-cluster-on-amazon-aws-ec2-instance-improved-part-1/