Configuring
Hadoop Singlenode Cluster on Ubuntu 12.04 64bit
Step:1
Download ,Install and Configure Java 1.6
Step:2
Download and Configure Hadoop 0.20.2
Step:1
below steps
i.
Go to Terminal and
type
$ sudo apt-get install openjdk-6-jdk (enter)
ii.
Setting
Environment Variables in Terminal
$ export JAVA_HOME=
/usr/lib/jvm/java-6-openjdk-amd64
(usual path. check once in your system)
and
in terminal type --->
$ sudo vim ~/.bashrc (enter)
you will see one
visual editor--> press i and type
export JAVA_HOME= /usr/lib/jvm/java-6-openjdk-amd64
Step:2
I. Create a system user account to use for hadoop installation.
type following in terminal
$ sudo addgroup hgroup (enter)
$ sudo adduser --ingroup hgroup huser (enter)
Adding user `huser' ...
Adding new user `huser' (1002) with group `hgroup' ...
Creating home directory `/home/huser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for huser
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y (enter)
II. Configuring
SSH Key Based Login
Its required to setup
haddop user to ssh itself without password. Using following method it will
enable key based login for hadoop user.
$ su - huser
"huser@localhost:~$" you will get like this
$ ssh-keygen -t rsa -P ""
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ ssh localhost
(if success you will get welcome
message)
(if fail you will get
ssh connect to host localhost port
22 connection refused)
then run following
command on super user... only if fails
$ sudo apt-get
install openssh-server (if fails then run)
then run again
$ ssh localhost
III. Disabling
IPV6
i.
you need to run this command as super user
$ sudo vim /etc/sysctl.conf
(press i.. copy below lines end of the file)
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
ii.
test ipv6 disabled or
not.. run below command
cat /proc/sys/net/ipv6/conf/all/disable_ipv6
You have to
reboot your machine in order to make the changes take effect
A return
value of 0 means IPv6 is enabled, a value of 1 means disabled (that’s what we
want)
IV. Download
the stable version 0.20.2 of Hadoop
i.
go to below link and
download
http://www.mediafire.com/download/dtv7k2bhvkfoq64/hadoop-0.20.2.tar.gz
ii.
Extract Hadoop tar file (as super user)
$
sudo cp /home/huser/Downloads/hadoop-0.20.2.tar.gz /usr/local
$
cd /usr/local/
$
sudo tar -xzf hadoop-0.20.2.tar.gz
$
sudo mv hadoop-0.20.2 hadoop
$
sudo chown -R huser:hgroup hadoop
$ su - huser
s
cd /usr/local/hadoop/ (now we are in hadoop home directory)
iii.
Configure Hadoop
Make the following
additions to the corresponding files: in hadoop/conf
run from huser--->
$ vim conf/core-site.xml
* core-site.xml
(inside the configuration tags)
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
$ vim
conf/mapred-site.xml
* mapred-site.xml
(inside the configuration tags)
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
$ vim conf/hdfs-site.xml
* hdfs-site.xml
(inside the configuration tags)
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
$ vim conf/hadoop-env.sh
* hadoop-env.sh
uncomment the JAVA_HOME export
command, and set the path to your Java home
export JAVA_HOME=/usr/lib/jvm/java java-6-openjdk-amd64
Set JAVA_HOME path as per your system
configuration for java.
iv.
Format Name Node
(from huser only)
$ cd /usr/local/hadoop
$ bin/hadoop namenode -format (you will get )
29/06/13 07:24:20
INFO namenode.NameNode: STARTUP_MSG:
/************************************************************ STARTUP_MSG:
Starting NameNode STARTUP_MSG: host = srv1.tecadmin.net/192.168.1.90
STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.0 STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473;
compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013 STARTUP_MSG: java =
1.7.0_17 ************************************************************/ 29/06/13
07:24:20 INFO util.GSet: Computing capacity for map BlocksMap 29/06/13 07:24:20
INFO util.GSet: VM type = 32-bit 29/06/13 07:24:20 INFO util.GSet: 2.0% max
memory = 1013645312 13/06/02 22:53:48 INFO util.GSet: capacity = 2^22 = 4194304
entries 29/06/13 07:24:20 INFO util.GSet: recommended=4194304, actual=4194304 29/06/13
07:24:20 INFO namenode.FSNamesystem: fsOwner=hadoop 13/06/02 22:53:49 INFO
namenode.FSNamesystem: supergroup=supergroup 29/06/13 07:24:20 INFO
namenode.FSNamesystem: isPermissionEnabled=true 29/06/13 07:24:20 INFO
namenode.FSNamesystem: dfs.block.invalidate.limit=100 29/06/13 07:24:20 INFO
namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0
min(s), accessTokenLifetime=0 min(s) 29/06/13 07:24:20 INFO namenode.FSEditLog:
dfs.namenode.edits.toleration.length = 0 29/06/13 07:24:20 INFO
namenode.NameNode: Caching file names occuring more than 10 times 29/06/13
07:24:20 INFO common.Storage: Image file of size 112 saved in 0 seconds. 29/06/13
07:24:20 INFO namenode.FSEditLog: closing edit log: position=4,
editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO
namenode.FSEditLog: close success: truncate to 4,
editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO common.Storage:
Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted. 29/06/13
07:24:20 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************ SHUTDOWN_MSG:
Shutting down NameNode at srv1.tecadmin.net/192.168.1.90
************************************************************
v.
Start Hadoop Services
$ bin/start-all.sh
the output will be
look like
huser@ubuntu:/usr/local/hadoop$
bin/start-all.sh
starting namenode,
logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-ubuntu.out
localhost: starting
datanode, logging to
/usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-ubuntu.out
localhost: starting
secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-ubuntu.out
starting jobtracker,
logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-jobtracker-ubuntu.out
localhost: starting
tasktracker, logging to
/usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-ubuntu.out
huser@ubuntu:/usr/local/hadoop$
vi.
Test and Access Hadoop Services
$ jps
<you will get like this>
26049 SecondaryNameNode
25929 DataNode
26399 Jps
26129 Jobtracker
26249 Tasktracker
25807 Namenode
Web Access URLs for Services
http://localhost:50030/
for the Jobtracker
http://localhost:50070/
for the Namenode
http://localhost:50060/
for the Tasktracker
Creating HOME Paths
for Hadoop and JAVA in bashrc file
run from
huser
$ vim $HOME/.bashrc
and press i, add following line
at end of the file
#
Set Hadoop-related environment variables
export
HADOOP_HOME=/usr/local/hadoop
#
Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export
JAVA_HOME=/usr/lib/jvm/java java-6-openjdk-amd64
#
Some convenient aliases and functions for running Hadoop-related commands
unalias
fs &> /dev/null
alias
fs="hadoop fs"
unalias
hls &> /dev/null
alias
hls="fs -ls"
#
If you have LZO compression enabled in your Hadoop cluster and
#
compress job outputs with LZOP (not covered in this tutorial):
# Conveniently
inspect an LZOP compressed file from the command
#
line; run via:
#
#
$ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
#
Requires installed 'lzop' command.
#
lzohead
() {
hadoop fs -cat $1 |
lzop -dc | head -1000 | less
}
that's it, now you can access Hadoop from
anywhere from terminal
Congratulations , your successfully configured
Hadoop on Ubuntu

good one keep it up all the artificats
ReplyDelete