Friday, 8 November 2013

Eclipse with Hadoop on Ubuntu

1 comment
hi friends this tutorial is for you

To configure Eclipse IDE in Hadoop 0.20.2  with Ubuntu 12.04 LTS


After successfully configured  hadoop setup on ubuntu follow below steps for to setup eclipse

(see step by step documentation in below link)
http://mytechlearners.com


Step 1: Download Eclipse Europa from below link




Step 2: copy eclipse file to usr/local from downloads

sudo cp $HOME/Downloads/<eclispe file nane> /usr/local/

Step 3: extract eclipse file in /usr/local directory

first     change direcoty to /usr/local

$ cd /usr/local   (enter)
$ sudo tar -xzvf   <eclipse-file-name>
$ ls  (you will get ecllispe named directory)

and now open usr/local/eclipse directory  from  file system (not from Terminal) then open eclipse (diamond symbol file)

Step 4: copy eclipse jar from hadoop to eclipse

$ sudo cp /usr/local/hadoop/contrib/eclipse-plugin/<jar-file-name> /usr/local/eclipse/plugins

now restart (close and open) eclipse .. then goto windows-->open perspective--> other --> select Map/Reduce and click on ok.

Step 5: Right click in the Map/Reduce Locations view and select 'New Hadoop Location'

In the window that appears, type in 'HDFS Cluster' for the Location name. Under Map/Reduce Master, type in localhost for the Host, and 9101 for the Port. For DFS Master, make sure the 'Use M/R Master host' checkbox is selected, and type in 9100 for the Port. For the User name, type in 'User'. Click 'Finish'

Step 6: In the Project Explorer window on the left, you should now be able to expand the DFS Locations tree and see your new location. Continue to expand it and you should see something like the following file structure:
         DFS Locations
                 -> HDFS Cluster(1)
                          -> (2)
                                   -> tmp(1)
                                   -> huser(0)
                                            ->

At this point you can create directories and files and upload them to the HDFS from Eclipse, or you can create them through the cygwin window as you did in step 7 in the previous section.








Read More...

Wednesday, 9 October 2013

Hadoop Important URL's Information

Leave a Comment
Read More...

Hadoop Information

Leave a Comment
Who Can Learn?

It is appropriate for developers, system administrators, managers, architects, or anyone who wants to jump start their adventure with Hadoop. Prior Hadoop knowledge is not required.



Introduction

      Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. It enables applications to work with thousands of computational independent computers and petabytes of data. Hadoop was derived from Google's MapReduce and Google File System (GFS) papers.
           The entire Apache Hadoop “platform” is now commonly considered to consist of the Hadoop kernel, MapReduce and HDFS, as well as a number of related projects – including Apache Hive, Apache Hbase, and others.
         Hadoop is a top-level Apache project being built and used by a global community of contributors, written in the Java programming language. The Apache Hadoop project and its related projects (Hive, HBase, Zookeeper, and so on) have many contributors from across the ecosystem.


History

             Hadoop was created by Doug Cutting and Michael J. Cafarella. Doug, who was working since 2009 as chief architect at Cloudera,named it after his son's toy elephant. It was originally developed to support distribution for the Nutch search engine project.


Architecture

             Hadoop consists of the Hadoop Common which provides access to the filesystems supported by Hadoop. The Hadoop Common package contains the necessary JAR files and scripts needed to start Hadoop. The package also provides source code, documentation, and a contribution section which includes projects from the Hadoop Community.
For effective scheduling of work, every Hadoop-compatible filesystem should provide location awareness: the name of the rack (more precisely, of the network switch) where a worker node is. Hadoop applications can use this information to run work on the node where the data is, and, failing that, on the same rack/switch, reducing backbone traffic. The Hadoop Distributed File System (HDFS) uses this when replicating data, to try to keep different copies of the data on different racks. The goal is to reduce the impact of a rack power outage or switch failure so that even if these events occur, the data may still be readable.
Hadoop cluster
A multi-node Hadoop cluster
A small Hadoop cluster will include a single master and multiple worker nodes. The master node consists of a JobTracker, TaskTracker, NameNode, and DataNode. A slave or worker node acts as both a DataNode and TaskTracker, though it is possible to have data-only worker nodes, and compute-only worker nodes; these are normally only used in non-standard applications. Hadoop requires JRE 1.6 or higher. The standard startup and shutdown scripts require ssh to be set up between nodes in the cluster.
In a larger cluster, the HDFS is managed through a dedicated NameNode server to host the filesystem index, and a secondary NameNode that can generate snapshots of the namenode's memory structures, thus preventing filesystem corruption and reducing loss of data. Similarly, a standalone JobTracker server can manage job scheduling. In clusters where the Hadoop MapReduce engine is deployed against an alternate filesystem, the NameNode, secondary NameNode and DataNode architecture of HDFS is replaced by the filesystem-specific equivalent.
Read More...

Monday, 26 August 2013

Hadoop Singlenode Cluster on CentOS

Leave a Comment
Configuring Hadoop Singlenode Cluster on CentOS 6.3 64bit

Step:1
      Download ,Install and Configure Java 1.6

Step:2
      Download and Configure Hadoop 0.20.2

Step:3
      Download and Configure Eclipse

Step:1

             I.     In CentOS , it's very easy to install JDK 1.6 follow
below steps

 i.     Goto (at top panel)System-->Administration--> Add/Remove Software-->(sometimes it ask authentication --click anyway)--> on Search Box Type -- JDK then click on find
ii.     Search for OpenJDK Developement Environment (1.6 vesrion)and select it then click apply
iii.     Once Installation Done, then open terminal and type      java -version(if you got version no.. then you installed java successfully)

           II.     Setting Environment Variables in Terminal

export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64
and
and in terminal    type --->    $ vi ~/.bash_profile   (enter)
you will see one visual editor--> press   i and type
export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64



Step 2:

             I.     Create a system user account to use for hadoop installation.


# useradd huser
# passwd huser

Changing password for user hadoop.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.


           II.     Configuring Key Based Login

Its required to setup haddop user to ssh ifself without password. Using following method it will enable key based login for hadoop user.


# su - huser
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
$ exit


         III.     Download the stable version 0.20.2 of Hadoop

 i.     go to below link and download

http://www.mediafire.com/download/dtv7k2bhvkfoq64/hadoop-0.20.2.tar.gz





ii.     Extract Hadoop tar file

# mkdir /opt/huser
# cp /home/huser/Downloads/hadoop-0.20.2.tar.gz hadoop
# cd /opt/huser/
# tar -xzf hadoop-0.20.2.tar.gz
# mv hadoop-0.20.2 hadoop
# chown -R huser /opt/huser
# cd /opt/huser/hadoop /

iii.     Configure Hadoop

Make the following additions to the corresponding files: in Hadoop/conf
         * core-site.xml (inside the configuration tags)
                 <property>
                          <name>fs.default.name</name>
                          <value>hdfs://localhost:8020</value>
                 </property>
         * mapred-site.xml (inside the configuration tags)
                 <property>
                          <name>mapred.job.tracker</name>
                          <value>localhost:8021</value>
                 </property>
         * hdfs-site.xml (inside the configuration tags)
                 <property>
                          <name>dfs.replication</name>
                          <value>1</value>
                 </property>
         * hadoop-env.sh
                 uncomment the JAVA_HOME export command, and set the path to your Java home

export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64

Set JAVA_HOME path as per your system configuration for java.

iv.     Format Name Node

# su - huser
$ cd /opt/huser/hadoop
$ bin/hadoop namenode -format   (you will get )
29/06/13 07:24:20 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = srv1.tecadmin.net/192.168.1.90 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.0 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013 STARTUP_MSG: java = 1.7.0_17 ************************************************************/ 29/06/13 07:24:20 INFO util.GSet: Computing capacity for map BlocksMap 29/06/13 07:24:20 INFO util.GSet: VM type = 32-bit 29/06/13 07:24:20 INFO util.GSet: 2.0% max memory = 1013645312 13/06/02 22:53:48 INFO util.GSet: capacity = 2^22 = 4194304 entries 29/06/13 07:24:20 INFO util.GSet: recommended=4194304, actual=4194304 29/06/13 07:24:20 INFO namenode.FSNamesystem: fsOwner=hadoop 13/06/02 22:53:49 INFO namenode.FSNamesystem: supergroup=supergroup 29/06/13 07:24:20 INFO namenode.FSNamesystem: isPermissionEnabled=true 29/06/13 07:24:20 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 29/06/13 07:24:20 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 29/06/13 07:24:20 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 29/06/13 07:24:20 INFO namenode.NameNode: Caching file names occuring more than 10 times 29/06/13 07:24:20 INFO common.Storage: Image file of size 112 saved in 0 seconds. 29/06/13 07:24:20 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted. 29/06/13 07:24:20 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at srv1.tecadmin.net/192.168.1.90 ************************************************************

 v.     Start Hadoop Services

$ bin/start-all.sh


vi.     Test and Access Hadoop Services

$ jps  <you will get like this>

26049 SecondaryNameNode
25929 DataNode
26399 Jps
26129 Jobtracker
26249 Tasktracker
25807 Namenode

Web Access URLs for Services
http://localhost:50030/   for the Jobtracker
http://localhost:50070/   for the Namenode
http://localhost:50060/   for the Tasktracker
Tip: set hadoop home path environment variable
export PATH=$PATH:/opt/huser/hadoop/bin
that's it, now you can access Hadoop from anywhere from terminal


Congratulations , your successfully configured Hadoop on CentOS


Read More...