Big Data and its ecosystem are the buzzwords in Analytics today. Being a new technology, it often creates inhibitions and doubts in the minds of its learners. Let me explain the idea in a very simple manner by comparing Big Data and its components with the ecosystem of living animals and plants in a forest. This would be an adventurous ride of big data safari.
Big Data Animal Planet:
In a forest, there are millions of species living and flourishing simultaneously. These can be broadly classified as Plants(In the Big Data Animal Planet, these would be replaced by Structured, Unstructured and Semi-Structured Data) and Animals (Hadoop components).
Apache Hadoop Ecosystem:
Hadoop is the framework for storing the Big Data, perform the Big Operation and deriving the Big Intelligent. Hadoop is the 100% open-source project administrated by the Apache Software Foundation for reliable, scalable and distributed computing.Leading experts like Google, Yahoo and Facebook are contributing more to this project development.
There are some additional components which are not related to the animal planet.
- Flume – Which is represented as sea water and it collects log data from the different sources to the Hadoop.
- Sqoop – transferring bulk data between Hadoop and structured data stores.
- Oozie – Workflow for interdependent Hadoop jobs.
- Avro –Data serializationframework.
We will see more about the role of each Hadoop ecosystem component on a typical Business Intelligence system in my next blog.