Configuring Hadoop
Singlenode Cluster on CentOS 6.3 64bit
Step:1
Download ,Install and Configure Java 1.6
Step:2
Download
and Configure Hadoop 0.20.2
Step:3
Download
and Configure Eclipse
Step:1
I. In CentOS , it's very easy to install JDK 1.6 follow
below steps
i.
Goto (at top
panel)System-->Administration--> Add/Remove Software-->(sometimes it
ask authentication --click anyway)--> on Search Box Type -- JDK then click on find
ii.
Search for OpenJDK Developement Environment (1.6
vesrion)and select it then click apply
iii.
Once Installation
Done, then open terminal and type
java -version(if you got version no.. then you installed java
successfully)
II. Setting Environment Variables in Terminal
export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64
and
and in terminal type ---> $ vi
~/.bash_profile (enter)
you will see one visual editor--> press i and type
export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64
Step 2:
I. Create a
system user account to use for hadoop installation.
#
useradd huser
#
passwd huser
Changing
password for user hadoop.
New
password:
Retype
new password:
passwd:
all authentication tokens updated successfully.
II. Configuring
Key Based Login
Its required to setup
haddop user to ssh ifself without password. Using following method it will
enable key based login for hadoop user.
# su - huser
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
$ exit
III. Download
the stable version 0.20.2 of Hadoop
i.
go to below link and
download
http://www.mediafire.com/download/dtv7k2bhvkfoq64/hadoop-0.20.2.tar.gz
ii.
Extract Hadoop tar file
#
mkdir /opt/huser
#
cp /home/huser/Downloads/hadoop-0.20.2.tar.gz hadoop
#
cd /opt/huser/
#
tar -xzf hadoop-0.20.2.tar.gz
#
mv hadoop-0.20.2 hadoop
#
chown -R huser /opt/huser
#
cd /opt/huser/hadoop /
iii.
Configure Hadoop
Make the following
additions to the corresponding files: in Hadoop/conf
* core-site.xml
(inside the configuration tags)
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
* mapred-site.xml
(inside the configuration tags)
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
* hdfs-site.xml
(inside the configuration tags)
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
* hadoop-env.sh
uncomment the JAVA_HOME export command, and set the
path to your Java home
export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64
Set JAVA_HOME path as per your system
configuration for java.
iv.
Format Name Node
#
su - huser
$
cd /opt/huser/hadoop
$
bin/hadoop namenode -format (you will
get )
29/06/13
07:24:20 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************ STARTUP_MSG:
Starting NameNode STARTUP_MSG: host = srv1.tecadmin.net/192.168.1.90
STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.0 STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473;
compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013 STARTUP_MSG: java =
1.7.0_17 ************************************************************/ 29/06/13
07:24:20 INFO util.GSet: Computing capacity for map BlocksMap 29/06/13 07:24:20
INFO util.GSet: VM type = 32-bit 29/06/13 07:24:20 INFO util.GSet: 2.0% max
memory = 1013645312 13/06/02 22:53:48 INFO util.GSet: capacity = 2^22 = 4194304
entries 29/06/13 07:24:20 INFO util.GSet: recommended=4194304, actual=4194304 29/06/13
07:24:20 INFO namenode.FSNamesystem: fsOwner=hadoop 13/06/02 22:53:49 INFO
namenode.FSNamesystem: supergroup=supergroup 29/06/13 07:24:20 INFO
namenode.FSNamesystem: isPermissionEnabled=true 29/06/13 07:24:20 INFO
namenode.FSNamesystem: dfs.block.invalidate.limit=100 29/06/13 07:24:20 INFO
namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0
min(s), accessTokenLifetime=0 min(s) 29/06/13 07:24:20 INFO namenode.FSEditLog:
dfs.namenode.edits.toleration.length = 0 29/06/13 07:24:20 INFO
namenode.NameNode: Caching file names occuring more than 10 times 29/06/13
07:24:20 INFO common.Storage: Image file of size 112 saved in 0 seconds. 29/06/13
07:24:20 INFO namenode.FSEditLog: closing edit log: position=4,
editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO
namenode.FSEditLog: close success: truncate to 4,
editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO
common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been
successfully formatted. 29/06/13 07:24:20 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************ SHUTDOWN_MSG:
Shutting down NameNode at srv1.tecadmin.net/192.168.1.90 ************************************************************
v.
Start Hadoop Services
vi.
Test and Access Hadoop Services
$ jps <you will get like this>
26049 SecondaryNameNode
25929 DataNode
26399 Jps
26129 Jobtracker
26249 Tasktracker
25807 Namenode
Web
Access URLs for Services
http://localhost:50030/
for the Jobtracker
http://localhost:50070/
for the Namenode
http://localhost:50060/
for the Tasktracker
Tip: set hadoop home path environment variable
export PATH=$PATH:/opt/huser/hadoop/bin
that's it, now you can access Hadoop from anywhere from
terminal
Congratulations , your successfully configured
Hadoop on CentOS