This website uses cookies. By continuing to browse the site, you are agreeing to our use of cookies
Data & AI Solutions
November 1, 2013
Hadoopdistributed file system (HDFS) which is specifically designed for very large file storing with very large streaming access patterns running on clusters of commodity hardware. Hadoop is fault tolerant, scalable and extremely simple to expand.It has three main daemons namely
In this blog, we would be reading about these daemons and read/write operation using rack topology.
NameNode demon runs on a master server that manages the metadata information of the hadoop. When a NameNode starts up, it reads HDFS state from an image, fsimage, and then applies edits from the edits log file. It then writes new HDFS state to the fsimage and starts normal operation with an empty edits file. Both fsimage and edits log files present in the native file system of the hadoop. Clients contact NameNode for file metadata or file modifications and perform actual file I/O directly with the DataNodes. The NameNode executes file system namespace operations like opening, closing and renaming files and directories. It also determines the mapping of blocks to DataNodes.
DataNode demon runs on slave nodes and stores the actually data inside the HDFS, there are a number of Datanodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of Data Nodes. The DataNodes perform block creation, deletion, and replication of data as per the NameNode’s Instruction. The active datanodes are live nodes and the inactive datenodes are dead nodes.
The secondary NameNode demon is to take a snapshot of Name node which merges the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run on a different machine than the primary NameNode since its memory requirements are on the same order as the primary NameNode. The secondary NameNode stores the latest checkpoint in a directory which is structured the same way as the primary NameNode’s directory So that the check pointed image is always ready to be read by the primary NameNode if necessary.
Fig. 1 – HDFS Architecture
Fig. 2 Write Operation
Fig. 3 Read Operation
Every outcome starts with a conversation