01 Nov 2013
3 MINS READ
Hadoopdistributed file system (HDFS) which is specifically designed for very large file storing with very large streaming access patterns running on clusters of commodity hardware. Hadoop is fault tolerant, scalable and extremely simple to expand.It has three main daemons namely
In this blog, we would be reading about these daemons and read/write operation using rack topology.
NameNode demon runs on a master server that manages the metadata information of the hadoop. When a NameNode starts up, it reads HDFS state from an image, fsimage, and then applies edits from the edits log file. It then writes new HDFS state to the fsimage and starts normal operation with an empty edits file. Both fsimage and edits log files present in the native file system of the hadoop. Clients contact NameNode for file metadata or file modifications and perform actual file I/O directly with the DataNodes. The NameNode executes file system namespace operations like opening, closing and renaming files and directories. It also determines the mapping of blocks to DataNodes.
DataNode demon runs on slave nodes and stores the actually data inside the HDFS, there are a number of Datanodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of Data Nodes. The DataNodes perform block creation, deletion, and replication of data as per the NameNode’s Instruction. The active datanodes are live nodes and the inactive datenodes are dead nodes.
The secondary NameNode demon is to take a snapshot of Name node which merges the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run on a different machine than the primary NameNode since its memory requirements are on the same order as the primary NameNode. The secondary NameNode stores the latest checkpoint in a directory which is structured the same way as the primary NameNode’s directory So that the check pointed image is always ready to be read by the primary NameNode if necessary.
Fig. 1 – HDFS Architecture
Fig. 2 Write Operation
Fig. 3 Read Operation
About the Author
BI & Analytics
13 Nov 2020
07 Sep 2020
11 Jun 2020
28 May 2020
08 May 2020
24 Apr 2020
13 Apr 2020
06 Apr 2020
31 Mar 2020
26 Mar 2020
23 Jun 2017
06 Aug 2015
13 Jul 2015
28 Oct 2014
17 Apr 2014
24 Mar 2014
22 Jan 2014
20 Dec 2013
26 Sep 2013
03 Sep 2013
26 Aug 2013
29 Apr 2013
04 Mar 2013
21 Feb 2013
04 Feb 2013
03 Jan 2013
26 Nov 2010
19 Mar 2009
Digital Assurance
02 Jan 2012
17 Feb 2012
Infrastructure Mgmt. Services
02 Mar 2012
06 Feb 2013
Digital Assurance, Enterprise Solutions
14 Feb 2013
18 Feb 2013
27 Feb 2013
Others
01 Mar 2013
Enterprise Solutions
05 Mar 2013
18 Mar 2013
Digital Assurance, Enterprise Solutions, Others
22 Mar 2013
12 Apr 2013
26 Apr 2013
13 May 2013
11 Jun 2013
17 Jun 2013
25 Jun 2013
19 Aug 2013
27 Aug 2013
10 Sep 2013
19 Sep 2013
24 Sep 2013
30 Sep 2013
01 Oct 2013
03 Oct 2013
19 Nov 2013
Enterprise Solutions, Manufacturing and Consumer
28 Nov 2013
03 Dec 2013
03 Jan 2014
27 Jan 2014
31 Jan 2014
12 Feb 2014
13 Feb 2014
20 Mar 2014
11 Jun 2014
Manufacturing and Consumer
26 Jun 2014
30 Jun 2014
10 Jul 2014
15 Jul 2014
16 Jul 2014
18 Jul 2014
26 Aug 2015
28 Sep 2015
07 Oct 2015
26 Oct 2015
07 Mar 2016
22 Mar 2016
13 May 2016
23 May 2016
Application Transformation Mgmt.
11 Jul 2016
25 Aug 2016
03 Sep 2016
14 Sep 2016
15 Nov 2016
22 Nov 2016
25 Nov 2016
Business Process Services
25 Apr 2017
Banking and Financial Services
18 May 2017
30 May 2017
27 Jun 2017
18 Jul 2017
26 Oct 2017
Healthcare, Insurance
28 Nov 2017
11 Dec 2017
25 Jan 2018
21 Feb 2018
14 Mar 2018
( Mandatory field * )
The information you provide will be used in accordance with our terms ofPrivacy Policy
Please Check on "I Agree" to register for the blog.