Monday, 26 August 2013

Hadoop Singlenode Cluster on CentOS

Leave a Comment
Configuring Hadoop Singlenode Cluster on CentOS 6.3 64bit

Step:1
      Download ,Install and Configure Java 1.6

Step:2
      Download and Configure Hadoop 0.20.2

Step:3
      Download and Configure Eclipse

Step:1

             I.     In CentOS , it's very easy to install JDK 1.6 follow
below steps

 i.     Goto (at top panel)System-->Administration--> Add/Remove Software-->(sometimes it ask authentication --click anyway)--> on Search Box Type -- JDK then click on find
ii.     Search for OpenJDK Developement Environment (1.6 vesrion)and select it then click apply
iii.     Once Installation Done, then open terminal and type      java -version(if you got version no.. then you installed java successfully)

           II.     Setting Environment Variables in Terminal

export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64
and
and in terminal    type --->    $ vi ~/.bash_profile   (enter)
you will see one visual editor--> press   i and type
export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64



Step 2:

             I.     Create a system user account to use for hadoop installation.


# useradd huser
# passwd huser

Changing password for user hadoop.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.


           II.     Configuring Key Based Login

Its required to setup haddop user to ssh ifself without password. Using following method it will enable key based login for hadoop user.


# su - huser
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
$ exit


         III.     Download the stable version 0.20.2 of Hadoop

 i.     go to below link and download

http://www.mediafire.com/download/dtv7k2bhvkfoq64/hadoop-0.20.2.tar.gz





ii.     Extract Hadoop tar file

# mkdir /opt/huser
# cp /home/huser/Downloads/hadoop-0.20.2.tar.gz hadoop
# cd /opt/huser/
# tar -xzf hadoop-0.20.2.tar.gz
# mv hadoop-0.20.2 hadoop
# chown -R huser /opt/huser
# cd /opt/huser/hadoop /

iii.     Configure Hadoop

Make the following additions to the corresponding files: in Hadoop/conf
         * core-site.xml (inside the configuration tags)
                 <property>
                          <name>fs.default.name</name>
                          <value>hdfs://localhost:8020</value>
                 </property>
         * mapred-site.xml (inside the configuration tags)
                 <property>
                          <name>mapred.job.tracker</name>
                          <value>localhost:8021</value>
                 </property>
         * hdfs-site.xml (inside the configuration tags)
                 <property>
                          <name>dfs.replication</name>
                          <value>1</value>
                 </property>
         * hadoop-env.sh
                 uncomment the JAVA_HOME export command, and set the path to your Java home

export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64

Set JAVA_HOME path as per your system configuration for java.

iv.     Format Name Node

# su - huser
$ cd /opt/huser/hadoop
$ bin/hadoop namenode -format   (you will get )
29/06/13 07:24:20 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = srv1.tecadmin.net/192.168.1.90 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.0 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013 STARTUP_MSG: java = 1.7.0_17 ************************************************************/ 29/06/13 07:24:20 INFO util.GSet: Computing capacity for map BlocksMap 29/06/13 07:24:20 INFO util.GSet: VM type = 32-bit 29/06/13 07:24:20 INFO util.GSet: 2.0% max memory = 1013645312 13/06/02 22:53:48 INFO util.GSet: capacity = 2^22 = 4194304 entries 29/06/13 07:24:20 INFO util.GSet: recommended=4194304, actual=4194304 29/06/13 07:24:20 INFO namenode.FSNamesystem: fsOwner=hadoop 13/06/02 22:53:49 INFO namenode.FSNamesystem: supergroup=supergroup 29/06/13 07:24:20 INFO namenode.FSNamesystem: isPermissionEnabled=true 29/06/13 07:24:20 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 29/06/13 07:24:20 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 29/06/13 07:24:20 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 29/06/13 07:24:20 INFO namenode.NameNode: Caching file names occuring more than 10 times 29/06/13 07:24:20 INFO common.Storage: Image file of size 112 saved in 0 seconds. 29/06/13 07:24:20 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted. 29/06/13 07:24:20 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at srv1.tecadmin.net/192.168.1.90 ************************************************************

 v.     Start Hadoop Services

$ bin/start-all.sh


vi.     Test and Access Hadoop Services

$ jps  <you will get like this>

26049 SecondaryNameNode
25929 DataNode
26399 Jps
26129 Jobtracker
26249 Tasktracker
25807 Namenode

Web Access URLs for Services
http://localhost:50030/   for the Jobtracker
http://localhost:50070/   for the Namenode
http://localhost:50060/   for the Tasktracker
Tip: set hadoop home path environment variable
export PATH=$PATH:/opt/huser/hadoop/bin
that's it, now you can access Hadoop from anywhere from terminal


Congratulations , your successfully configured Hadoop on CentOS


0 comments:

Post a Comment