Friday, 8 November 2013

Eclipse with Hadoop on Ubuntu

By Raman 4:57:00 pm eclipse on hadoop, eclipse with hadoop, hadop with eclipse 1 comment

hi friends this tutorial is for you

To configure Eclipse IDE in Hadoop 0.20.2 with Ubuntu 12.04 LTS

After successfully configured hadoop setup on ubuntu follow below steps for to setup eclipse

(see step by step documentation in below link)

http://mytechlearners.com

Step 1: Download Eclipse Europa from below link

for Ubuntu 64 bit

for Ubuntu 32-bit

Step 2: copy eclipse file to usr/local from downloads

sudo cp $HOME/Downloads/<eclispe file nane> /usr/local/

Step 3: extract eclipse file in /usr/local directory

first change direcoty to /usr/local

$ cd /usr/local (enter)

$ sudo tar -xzvf <eclipse-file-name>

$ ls (you will get ecllispe named directory)

and now open usr/local/eclipse directory from file system (not from Terminal) then open eclipse (diamond symbol file)

Step 4: copy eclipse jar from hadoop to eclipse

$ sudo cp /usr/local/hadoop/contrib/eclipse-plugin/<jar-file-name> /usr/local/eclipse/plugins

now restart (close and open) eclipse .. then goto windows-->open perspective--> other --> select Map/Reduce and click on ok.

Step 5: Right click in the Map/Reduce Locations view and select 'New Hadoop Location'

In the window that appears, type in 'HDFS Cluster' for the Location name. Under Map/Reduce Master, type in localhost for the Host, and 9101 for the Port. For DFS Master, make sure the 'Use M/R Master host' checkbox is selected, and type in 9100 for the Port. For the User name, type in 'User'. Click 'Finish'

Step 6: In the Project Explorer window on the left, you should now be able to expand the DFS Locations tree and see your new location. Continue to expand it and you should see something like the following file structure:

DFS Locations

-> HDFS Cluster(1)

-> (2)

-> tmp(1)

-> huser(0)

At this point you can create directories and files and upload them to the HDFS from Eclipse, or you can create them through the cygwin window as you did in step 7 in the previous section.

Wednesday, 9 October 2013

Hadoop Important URL's Information

By Unknown 9:06:00 pm Hadoop Important URL's Information Leave a Comment

Hadoop Important URL's Information

#   Running Hadoop On Ubuntu Linux (Single-Node Cluster)

#   Running Hadoop On Ubuntu Linux (Multi-Node Cluster)

#   Big Data University Hadoop Information

#   Hadoop Tutorial

#   Hadoop in Practice

#   Hadoop Presentation

#   Hadoop Talks

#   Big Data and NoSQL database

#   Hadoop fundamentals

#   Introduction to Hadoop

#   Apache Hadoop Topics for Certification

#   Hadoop for Transcriptomics

#   What is Hadoop

#   Power Exchange for Hadoop

#   What is Apache Hadoop

#   Overview Of Hadoop

#   Big Data analytics ,Hadoop

#   An Overview Of Hadoop (How it works)

#   Hadoop architecture

#   Understanding Hadoop Clusters and the Network

#   How To Build Optimal Hadoop Cluster

#   Big Ideas: Demystifying Hadoop

#   Hadoop Cluster

#   Understanding storage in the Hadoop cluster

#   First Hadoop Platform Solution for Parsing Big Data

#   Map Reduce Clusters

#   Next Hadoop Configuration

#   Cloudera Releases Next-Generation Hadoop Platform

#   Enabling Big Data Analytics Inside of Hadoop

#   What is it, Hadoop

#   Apache Hadoop

#   Hadoop Tutorial

#   Hadoop Apps

#   Setup Hadoop on Ubuntu

#   Practical Problem Solving with Apache Hadoop and Pig

# Download Good eBooks for free

# Big Data Circus

Hadoop Information

By Unknown 9:04:00 pm Hadoop Information Leave a Comment

Who Can Learn?

It is appropriate for developers, system administrators, managers, architects, or anyone who wants to jump start their adventure with Hadoop. Prior Hadoop knowledge is not required.

Introduction

Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. It enables applications to work with thousands of computational independent computers and petabytes of data. Hadoop was derived from Google's MapReduce and Google File System (GFS) papers.

The entire Apache Hadoop “platform” is now commonly considered to consist of the Hadoop kernel, MapReduce and HDFS, as well as a number of related projects – including Apache Hive, Apache Hbase, and others.

Hadoop is a top-level Apache project being built and used by a global community of contributors, written in the Java programming language. The Apache Hadoop project and its related projects (Hive, HBase, Zookeeper, and so on) have many contributors from across the ecosystem.

History

Hadoop was created by Doug Cutting and Michael J. Cafarella. Doug, who was working since 2009 as chief architect at Cloudera,named it after his son's toy elephant. It was originally developed to support distribution for the Nutch search engine project.

Architecture

Hadoop consists of the Hadoop Common which provides access to the filesystems supported by Hadoop. The Hadoop Common package contains the necessary JAR files and scripts needed to start Hadoop. The package also provides source code, documentation, and a contribution section which includes projects from the Hadoop Community.

For effective scheduling of work, every Hadoop-compatible filesystem should provide location awareness: the name of the rack (more precisely, of the network switch) where a worker node is. Hadoop applications can use this information to run work on the node where the data is, and, failing that, on the same rack/switch, reducing backbone traffic. The Hadoop Distributed File System (HDFS) uses this when replicating data, to try to keep different copies of the data on different racks. The goal is to reduce the impact of a rack power outage or switch failure so that even if these events occur, the data may still be readable.

A multi-node Hadoop cluster

A small Hadoop cluster will include a single master and multiple worker nodes. The master node consists of a JobTracker, TaskTracker, NameNode, and DataNode. A slave or worker node acts as both a DataNode and TaskTracker, though it is possible to have data-only worker nodes, and compute-only worker nodes; these are normally only used in non-standard applications. Hadoop requires JRE 1.6 or higher. The standard startup and shutdown scripts require ssh to be set up between nodes in the cluster.

In a larger cluster, the HDFS is managed through a dedicated NameNode server to host the filesystem index, and a secondary NameNode that can generate snapshots of the namenode's memory structures, thus preventing filesystem corruption and reducing loss of data. Similarly, a standalone JobTracker server can manage job scheduling. In clusters where the Hadoop MapReduce engine is deployed against an alternate filesystem, the NameNode, secondary NameNode and DataNode architecture of HDFS is replaced by the filesystem-specific equivalent.

Monday, 26 August 2013

Hadoop Singlenode Cluster on CentOS

By Unknown 7:38:00 pm configure hadoop on centOS daemons single node cluster Leave a Comment

Configuring Hadoop Singlenode Cluster on CentOS 6.3 64bit

Step:1

Download ,Install and Configure Java 1.6

Step:2

Download and Configure Hadoop 0.20.2

Step:3

Download and Configure Eclipse

Step:1

I. In CentOS , it's very easy to install JDK 1.6 follow

below steps

i. Goto (at top panel)System-->Administration--> Add/Remove Software-->(sometimes it ask authentication --click anyway)--> on Search Box Type -- JDK then click on find

ii. Search for OpenJDK Developement Environment (1.6 vesrion)and select it then click apply

iii. Once Installation Done, then open terminal and type java -version(if you got version no.. then you installed java successfully)

II. Setting Environment Variables in Terminal

export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64

and

and in terminal type ---> $ vi ~/.bash_profile (enter)

you will see one visual editor--> press i and type

export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64

Step 2:

I. Create a system user account to use for hadoop installation.

# useradd huser

# passwd huser

Changing password for user hadoop.

New password:

Retype new password:

passwd: all authentication tokens updated successfully.

II. Configuring Key Based Login

Its required to setup haddop user to ssh ifself without password. Using following method it will enable key based login for hadoop user.

# su - huser

$ ssh-keygen -t rsa

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

$ chmod 0600 ~/.ssh/authorized_keys

$ exit

III. Download the stable version 0.20.2 of Hadoop

i. go to below link and download

http://www.mediafire.com/download/dtv7k2bhvkfoq64/hadoop-0.20.2.tar.gz

ii. Extract Hadoop tar file

# mkdir /opt/huser

# cp /home/huser/Downloads/hadoop-0.20.2.tar.gz hadoop

# cd /opt/huser/

# tar -xzf hadoop-0.20.2.tar.gz

# mv hadoop-0.20.2 hadoop

# chown -R huser /opt/huser

# cd /opt/huser/hadoop /

iii. Configure Hadoop

Make the following additions to the corresponding files: in Hadoop/conf

* core-site.xml (inside the configuration tags)

<name>fs.default.name</name>

<value>hdfs://localhost:8020</value>

</property>

* mapred-site.xml (inside the configuration tags)

<name>mapred.job.tracker</name>

<value>localhost:8021</value>

</property>

* hdfs-site.xml (inside the configuration tags)

<name>dfs.replication</name>

</property>

* hadoop-env.sh

uncomment the JAVA_HOME export command, and set the path to your Java home

export JAVA_HOME= /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64

Set JAVA_HOME path as per your system configuration for java.

iv. Format Name Node

# su - huser

$ cd /opt/huser/hadoop

$ bin/hadoop namenode -format (you will get )

29/06/13 07:24:20 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = srv1.tecadmin.net/192.168.1.90 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.0 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013 STARTUP_MSG: java = 1.7.0_17 ************************************************************/ 29/06/13 07:24:20 INFO util.GSet: Computing capacity for map BlocksMap 29/06/13 07:24:20 INFO util.GSet: VM type = 32-bit 29/06/13 07:24:20 INFO util.GSet: 2.0% max memory = 1013645312 13/06/02 22:53:48 INFO util.GSet: capacity = 2^22 = 4194304 entries 29/06/13 07:24:20 INFO util.GSet: recommended=4194304, actual=4194304 29/06/13 07:24:20 INFO namenode.FSNamesystem: fsOwner=hadoop 13/06/02 22:53:49 INFO namenode.FSNamesystem: supergroup=supergroup 29/06/13 07:24:20 INFO namenode.FSNamesystem: isPermissionEnabled=true 29/06/13 07:24:20 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 29/06/13 07:24:20 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 29/06/13 07:24:20 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 29/06/13 07:24:20 INFO namenode.NameNode: Caching file names occuring more than 10 times 29/06/13 07:24:20 INFO common.Storage: Image file of size 112 saved in 0 seconds. 29/06/13 07:24:20 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 29/06/13 07:24:20 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted. 29/06/13 07:24:20 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at srv1.tecadmin.net/192.168.1.90 ************************************************************

v. Start Hadoop Services

$ bin/start-all.sh

vi. Test and Access Hadoop Services

$ jps <you will get like this>

26049 SecondaryNameNode

25929 DataNode

26399 Jps

26129 Jobtracker

26249 Tasktracker

25807 Namenode

Web Access URLs for Services

http://localhost:50030/ for the Jobtracker

http://localhost:50070/ for the Namenode

http://localhost:50060/ for the Tasktracker

Tip: set hadoop home path environment variable

export PATH=$PATH:/opt/huser/hadoop/bin

that's it, now you can access Hadoop from anywhere from terminal

Congratulations , your successfully configured Hadoop on CentOS

MY TECH LEARNERS

Friday, 8 November 2013

Eclipse with Hadoop on Ubuntu

Wednesday, 9 October 2013

Hadoop Important URL's Information

Hadoop Important URL's Information

Hadoop Information

History

Architecture

Monday, 26 August 2013

Hadoop Singlenode Cluster on CentOS

Search This Blog

Blog Posts

Pages

Translate

Labels

Popular Posts

Connect

Total Pageviews

Followers

About Me