Monday, 26 August 2013

Hadoop Singlenode Cluster on CentOS

Leave a Comment
Configuring Hadoop Singlenode Cluster on CentOS 6.3 64bit

Step:1
      Download ,Install and Configure Java 1.6

Step:2
      Download and Configure Hadoop 0.20.2

Step:3
      Download and Configure Eclipse

Step:1

             I.     In CentOS , it's very easy to install JDK 1.6 follow
below steps

 i.     Goto (at top panel)System-->Administration--> Add/Remove Software-->(sometimes it ask authentication --click anyway)--> on Search Box Type -- JDK then click on find
ii.     Search for OpenJDK Developement Environment (1.6 vesrion)and select it then click apply
iii.     Once Installation Done, then open terminal and type      java -version(if you got version no.. then you installed java successfully)

           II.     Setting Environment Variables in Terminal

export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64
and
and in terminal    type --->    $ vi ~/.bash_profile   (enter)
you will see one visual editor--> press   i and type
export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64



Step 2:

             I.     Create a system user account to use for hadoop installation.


# useradd huser
# passwd huser

Changing password for user hadoop.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.


           II.     Configuring Key Based Login

Its required to setup haddop user to ssh ifself without password. Using following method it will enable key based login for hadoop user.


# su - huser
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
$ exit


         III.     Download the stable version 0.20.2 of Hadoop

 i.     go to below link and download

http://www.mediafire.com/download/dtv7k2bhvkfoq64/hadoop-0.20.2.tar.gz





ii.     Extract Hadoop tar file

# mkdir /opt/huser
# cp /home/huser/Downloads/hadoop-0.20.2.tar.gz hadoop
# cd /opt/huser/
# tar -xzf hadoop-0.20.2.tar.gz
# mv hadoop-0.20.2 hadoop
# chown -R huser /opt/huser
# cd /opt/huser/hadoop /

iii.     Configure Hadoop

Make the following additions to the corresponding files: in Hadoop/conf
         * core-site.xml (inside the configuration tags)
                 <property>
                          <name>fs.default.name</name>
                          <value>hdfs://localhost:8020</value>
                 </property>
         * mapred-site.xml (inside the configuration tags)
                 <property>
                          <name>mapred.job.tracker</name>
                          <value>localhost:8021</value>
                 </property>
         * hdfs-site.xml (inside the configuration tags)
                 <property>
                          <name>dfs.replication</name>
                          <value>1</value>
                 </property>
         * hadoop-env.sh
                 uncomment the JAVA_HOME export command, and set the path to your Java home

export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64

Set JAVA_HOME path as per your system configuration for java.

iv.     Format Name Node

# su - huser
$ cd /opt/huser/hadoop
$ bin/hadoop namenode -format   (you will get )
29/06/13 07:24:20 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = srv1.tecadmin.net/192.168.1.90 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.0 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013 STARTUP_MSG: java = 1.7.0_17 ************************************************************/ 29/06/13 07:24:20 INFO util.GSet: Computing capacity for map BlocksMap 29/06/13 07:24:20 INFO util.GSet: VM type = 32-bit 29/06/13 07:24:20 INFO util.GSet: 2.0% max memory = 1013645312 13/06/02 22:53:48 INFO util.GSet: capacity = 2^22 = 4194304 entries 29/06/13 07:24:20 INFO util.GSet: recommended=4194304, actual=4194304 29/06/13 07:24:20 INFO namenode.FSNamesystem: fsOwner=hadoop 13/06/02 22:53:49 INFO namenode.FSNamesystem: supergroup=supergroup 29/06/13 07:24:20 INFO namenode.FSNamesystem: isPermissionEnabled=true 29/06/13 07:24:20 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 29/06/13 07:24:20 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 29/06/13 07:24:20 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 29/06/13 07:24:20 INFO namenode.NameNode: Caching file names occuring more than 10 times 29/06/13 07:24:20 INFO common.Storage: Image file of size 112 saved in 0 seconds. 29/06/13 07:24:20 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted. 29/06/13 07:24:20 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at srv1.tecadmin.net/192.168.1.90 ************************************************************

 v.     Start Hadoop Services

$ bin/start-all.sh


vi.     Test and Access Hadoop Services

$ jps  <you will get like this>

26049 SecondaryNameNode
25929 DataNode
26399 Jps
26129 Jobtracker
26249 Tasktracker
25807 Namenode

Web Access URLs for Services
http://localhost:50030/   for the Jobtracker
http://localhost:50070/   for the Namenode
http://localhost:50060/   for the Tasktracker
Tip: set hadoop home path environment variable
export PATH=$PATH:/opt/huser/hadoop/bin
that's it, now you can access Hadoop from anywhere from terminal


Congratulations , your successfully configured Hadoop on CentOS


Read More...

Hadoop Cluster on UBUNTU 12.04 LTS

1 comment
Configuring Hadoop Singlenode Cluster on Ubuntu 12.04 64bit

Step:1
      Download ,Install and Configure Java 1.6

Step:2
      Download and Configure Hadoop 0.20.2

Step:1

             I.     In Ubuntu , it's very easy to install JDK 1.6 follow
below steps

 i.     Go to Terminal and type

$ sudo apt-get install openjdk-6-jdk (enter)

ii.     Setting Environment Variables in Terminal

$ export JAVA_HOME= /usr/lib/jvm/java-6-openjdk-amd64
                        (usual path. check once in your system)
and

in terminal    type --->   

$ sudo vim ~/.bashrc (enter)

you will see one visual editor--> press   i and type

export JAVA_HOME= /usr/lib/jvm/java-6-openjdk-amd64






Step:2
             I.     Create a system user account to use for hadoop installation.

type following in terminal

$ sudo addgroup hgroup (enter)
$ sudo adduser --ingroup hgroup huser (enter)
Adding user `huser' ...
Adding new user `huser' (1002) with group `hgroup' ...
Creating home directory `/home/huser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for huser
Enter the new value, or press ENTER for the default
      Full Name []:
      Room Number []:
      Work Phone []:
      Home Phone []:
      Other []:
Is the information correct? [Y/n] Y (enter)

           II.     Configuring SSH Key Based Login

Its required to setup haddop user to ssh itself without password. Using following method it will enable key based login for hadoop user.


$ su - huser
"huser@localhost:~$" you will get like this
$ ssh-keygen -t rsa -P ""
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys



$ ssh localhost
            (if success you will get welcome message)
            (if fail you will get
            ssh connect to host localhost port 22 connection refused)
then run following command on super user... only if fails
$ sudo apt-get install openssh-server (if fails then run)
then run again
$ ssh localhost

         III.     Disabling IPV6

 i.     you need to run this command as super user
$ sudo vim /etc/sysctl.conf
(press i.. copy below lines end of the file)

          # disable ipv6
          net.ipv6.conf.all.disable_ipv6 = 1
          net.ipv6.conf.default.disable_ipv6 = 1
          net.ipv6.conf.lo.disable_ipv6 = 1

ii.     test ipv6 disabled or not.. run below command

cat /proc/sys/net/ipv6/conf/all/disable_ipv6

You have to reboot your machine in order to make the changes take effect

A return value of 0 means IPv6 is enabled, a value of 1 means disabled (that’s what we want)


           IV.     Download the stable version 0.20.2 of Hadoop

 i.     go to below link and download

http://www.mediafire.com/download/dtv7k2bhvkfoq64/hadoop-0.20.2.tar.gz




ii.     Extract Hadoop tar file (as super user)

$ sudo cp /home/huser/Downloads/hadoop-0.20.2.tar.gz /usr/local
$ cd /usr/local/
$ sudo tar -xzf hadoop-0.20.2.tar.gz
$ sudo mv hadoop-0.20.2 hadoop
$ sudo chown -R huser:hgroup hadoop
$ su - huser
s cd /usr/local/hadoop/ (now we are in hadoop home directory)

iii.     Configure Hadoop

Make the following additions to the corresponding files: in hadoop/conf
run from huser--->
$ vim conf/core-site.xml
      * core-site.xml (inside the configuration tags)
            <property>
                  <name>fs.default.name</name>
                  <value>hdfs://localhost:8020</value>
            </property>
$ vim conf/mapred-site.xml
      * mapred-site.xml (inside the configuration tags)
            <property>
                  <name>mapred.job.tracker</name>
                  <value>localhost:8021</value>
            </property>
$ vim conf/hdfs-site.xml
      * hdfs-site.xml (inside the configuration tags)
            <property>
                  <name>dfs.replication</name>
                  <value>1</value>
            </property>

$ vim conf/hadoop-env.sh
      * hadoop-env.sh
            uncomment the JAVA_HOME export command, and set the path to your Java home

export JAVA_HOME=/usr/lib/jvm/java java-6-openjdk-amd64

Set JAVA_HOME path as per your system configuration for java.

iv.     Format Name Node

(from huser only)
$ cd /usr/local/hadoop
$ bin/hadoop namenode -format   (you will get )
29/06/13 07:24:20 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = srv1.tecadmin.net/192.168.1.90 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.0 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013 STARTUP_MSG: java = 1.7.0_17 ************************************************************/ 29/06/13 07:24:20 INFO util.GSet: Computing capacity for map BlocksMap 29/06/13 07:24:20 INFO util.GSet: VM type = 32-bit 29/06/13 07:24:20 INFO util.GSet: 2.0% max memory = 1013645312 13/06/02 22:53:48 INFO util.GSet: capacity = 2^22 = 4194304 entries 29/06/13 07:24:20 INFO util.GSet: recommended=4194304, actual=4194304 29/06/13 07:24:20 INFO namenode.FSNamesystem: fsOwner=hadoop 13/06/02 22:53:49 INFO namenode.FSNamesystem: supergroup=supergroup 29/06/13 07:24:20 INFO namenode.FSNamesystem: isPermissionEnabled=true 29/06/13 07:24:20 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 29/06/13 07:24:20 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 29/06/13 07:24:20 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 29/06/13 07:24:20 INFO namenode.NameNode: Caching file names occuring more than 10 times 29/06/13 07:24:20 INFO common.Storage: Image file of size 112 saved in 0 seconds. 29/06/13 07:24:20 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted. 29/06/13 07:24:20 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at srv1.tecadmin.net/192.168.1.90 ************************************************************

 v.     Start Hadoop Services

$ bin/start-all.sh
the output will be look like
huser@ubuntu:/usr/local/hadoop$ bin/start-all.sh
starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-ubuntu.out
localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-ubuntu.out
starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-ubuntu.out
huser@ubuntu:/usr/local/hadoop$






vi.     Test and Access Hadoop Services

$ jps     <you will get like this>

26049 SecondaryNameNode
25929 DataNode
26399 Jps
26129 Jobtracker
26249 Tasktracker
25807 Namenode

Web Access URLs for Services
http://localhost:50030/   for the Jobtracker
http://localhost:50070/   for the Namenode
http://localhost:50060/   for the Tasktracker
Creating HOME Paths for Hadoop  and  JAVA in bashrc file
run from huser
$ vim $HOME/.bashrc
and press i, add following line at end of the file
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java java-6-openjdk-amd64
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}

that's it, now you can access Hadoop from anywhere from terminal

Congratulations , your successfully configured Hadoop on Ubuntu


Read More...