Saturday, 8 September 2012

Hadoop with HDFS and Map Reduce

Hadoop is an open source project for processing large datasets in parallel with the use of low level commodity machines.

Hadoop is build on two main parts. An special file system called Hadoop Distributed File System (HDFS) and the Map Reduce Framework.

The HDFS File System is an optimized file system for distributed processing of very large datasets on commodity hardware.

The map reduce framework works in two main phases to process the data. Which are the Map phase and the Reduce phase.