Sign up FAST! Login

Hadoop is a filing system and MapReduce. Explained.


http://www.bigdataplanet.info/2013/10/hadoop-tutorials-part-1-what-is-hadoop.html

biggest filing cabinet

Hadoop was derived from the research paper published by Google on Google File System(GFS) and Google's MapReduce. So there are two integral parts of Hadoop: Hadoop Distributed File System(HDFS) and Hadoop MapReduce.

Hadoop-HDFS-MAPREDUCE.png

Hadoop Distributed File System (HDFS)

HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters ofcommodity hardware.

What is the difference between normal FileSystem and Hadoop Distributed File System?

The major two differences that is notable between HDFS and other Filesystems are:Block Size: Every disk is made up of a block size. And this is the minimum amount of data that is written and read from a Disk. Now a Filesystem also consists of blocks which is made out of these blocks on the disk. Normally disk blocks are of 512 bytes and those of filesystem are of a few kilobytes.  In case of HDFS we also have the blocks concept. But here one block size is of 64 MB by default and which can be increased in an integral multiple of 64 i.e. 128MB, 256MB, 512MB or even more in GB's. It all depend on the requirement and use-cases. 

          So Why are these blocks size so large for HDFS? keep on reading and you will get it in a next few tutorials :)Metadata Storage: In normal file system there is a hierarchical storage of metadata i.e. lets say there is a folder ABC,inside that folder there is again one another folder DEFand inside that there is hello.txt file. Now the information about hello.txt (i.e. metadata info of hello.txt) file will be with DEF and again the metadata of DEF will be with ABC. Hence this forms a hierarchy and this hierarchy is maintained until the root of the filesystem. But in HDFS we don't have a hierarchy of metadata. All the metadata information resides with a single machine known as Namenode (or Master Node) on the cluster. And this node contains all the information about other files and folder and lots of other information too, which we will learn in the next few tutorials. :) 

Sometime ago, I was also confused about MapReduce and this was the best explanation I could find: http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/ 

Stashed in:

To save this post, select a stash from drop-down menu or type in a new one:

You May Also Like: