SETTING UP HADOOP DAEMONS ON WINDOWS OS
Step:1
Install and Configure Cygwin 2.x
Step:2
Install
and Configure Java 1.6
Step:3
Install and Configure Hadoop 0.20.2
Step:4
Install and Configure Eclipse Europa
Step:1
1. Download
'setup.exe' from Cygwin website
2. Right-click on
'setup.exe' And Click on Run as Administrator
3. Leave settings as
they are, click through until you come to the plugin selection window
3.1 - Make sure that the installation
directory is 'C:\cygwin'
4. In the plugin
selection window, select below packages and install
i) openssh
ii) openssl
iii) diffutils
iv) tcp_wrappers
5. Click 'Next', then
go do something productive while installation runs its course.
6. Once installed, go
to Start -> All Programs -> Cygwin, right-click on the subsequent
shortcut and select the option to 'Run as Administrator'
This is needed in order to allow the
sshd account to operate without a passphrase, which is required for Hadoop to
work.
8. Once the prompt
window opens up, type 'ssh-host-config' and hit Enter
9. Should privilege
separation be used? NO
10. Do you want to
install sshd as a service? YES
11. Enter the value of
CYGWIN for the daemon: <LEAVE BLANK, JUST HIT ENTER>
12. Do you want to use
a different name? (default is 'cyg_server'): NO
13. Please enter the
password for user 'cyg_server': <LEAVE BLANK, JUST HIT ENTER>
14. Reenter: <LEAVE
BLANK, JUST HIT ENTER>
At this point the ssh
service should be installed, to run under the 'cyg_server' account. Don't
worry, this will all be handled under the hood.
To start the ssh
service, in services --> CYGWINsshd (right-click)--> start and ok. When
you log in next time, this will automatically run.
To test, type in 'ssh
localhost' in your cygwin window. You should not be prompted for anything.
15. Set Cygwin Path in
MyComputer-->Properties-->Advanced System Settings-->Environment
variables-->create NEW System variables like below
Step2:
1. Download Java 1.6
from Oracle Website
If Can't find Check this link
http://www.oracle.com/technetwork/java/javase/downloads/index.html
2. Right-click on
Java(downloaded file) and install like Run as Administrator
Note: When installing Change your path for both
JDK and JRE files like c:/Java/<jdk or jre>
3. Leave settings as
they are, click through until you come to the Finish prompt
4. Set Java Home Path
in MyComputer-->Properties-->Advanced System Settings-->Environment
variables-->create NEW user variables like below
Step3:
This is assuming the
installation of version 0.20.2 of Hadoop. Newer versions do not get along
with Windows OS (mainly, the tasktracker daemon which requires permissions to
be set that are inherently not allowed by Windows OS, but are required by
more recent versions of Hadoop e.g. 0.20.20x.x)
1. Download the stable
version 0.20.2 of Hadoop
2. Using 7-Zip (you
should download this if you have not already, and it should be your default
archive browser), open up the archive file. Copy the top level directory from
the archive file and paste it into your home directory in C:/cygwin. This is
usually something like C:\cygwin\usr\local
3. Once copied into
your cygwin home directory, navigate to {hadoop-home}/conf. Open the
following files for editing in your favorite editor (I strongly suggest
Notepad++ ... why would you use anything else):
* core-site.xml
* hdfs-site.xml
* mapred-site.xml
* hadoop-env.sh
4. Make the following
additions to the corresponding files:
* core-site.xml (inside the
configuration tags)
<property>
<name>fs.default.name</name>
<value>localhost:9100</value>
</property>
* mapred-site.xml (inside the
configuration tags)
<property>
<name>mapred.job.tracker</name>
<value>localhost:9101</value>
</property>
* hdfs-site.xml (inside the
configuration tags)
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
* hadoop-env.sh
* uncomment the JAVA_HOME
export command, and set the path to your Java home (typically /cygdrive/c/Java/jdk1.6.0/}
5. In a cygwin window,
inside your top-level hadoop directory, it's time to format your Hadoop file
system. Type in 'bin/hadoop namenode -format' and hit enter. This will create
and format the HDFS.
6. Now it is time to
start all of the hadoop daemons that will simulate a distributed system, type
in: 'bin/start-all.sh' and hit enter.
You should not receive
any errors (there may be some messages about not being able to change to the
home directory, but this is ok).
Double check that your
HDFS and JobTracker is up and running properly by visiting
http://localhost:50070 and http://localhost:50030, respectively.
To make sure
everything is up and running properly, let's try a regex example.
7. From the top level hadoop
directory, type in the following set of commands:
$ bin/hadoop dfs -mkdir input
$ bin/hadoop dfs -put conf input
$ bin/hadoop jar
hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ bin/hadoop dfs -cat output/*
This should display the output of the
job (finding any word that matched the regex pattern above).
8. Assuming no errors,
you are all set to set up your Eclipse environment.
FYI, you can stop all
your daemons by typing in 'bin/stop-all.sh', but keep it running for now as
we move on to the next step.
Step4:
1. Download Eclipse Europa
2. Copy the hadoop
plugin jar located at:
Normally you could use the plugin
provided in the 0.20.2 contrib folder that comes with Hadoop.
3. Copy the jar and
paste it into your Eclipse plugins directory (e.g. C:/eclipse/plugins)
4. In a regular
command prompt, navigate to your eclipse folder (e.g. 'cd C:/eclipse')
5. Once Eclipse is
open, open a new perspective (top right corner) and select 'Other'. From the
list, select 'MapReduce'.
6. Go to Window ->
Show View, and select Map/Reduce. This will open a view window for Map/Reduce
Locations
7. Now you are ready
to tie in Eclipse with your existing HDFS that you formatted and configured
earlier. Right click in the Map/Reduce Locations view and select 'New Hadoop
Location'
8. In the window that
appears, type in 'localhost' for the Location name. Under Map/Reduce Master,
type in localhost for the Host, and 9101 for the Port. For DFS Master, make
sure the 'Use M/R Master host' checkbox is selected, and type in 9100 for the
Port. For the User name, type in 'User'. Click 'Finish'
9. In the Project
Explorer window on the left, you should now be able to expand the DFS
Locations tree and see your new location. Continue to expand it and you
should see something like the following file structure:
DFS Locations
-> (1)
-> tmp (1)
->
hadoop-{username} (1)
->
mapred (1)
->
system(1)
->
jobtracker.info
At this point you can
create directories and files and upload them to the HDFS from Eclipse, or you
can create them through the cygwin window as you did in step 7 in the
previous section.
|
0 comments:
Post a Comment