Big Data Supermarket
Data & Analytics
Last Updated: August 26, 2013
As we all know that a Supermarket is something big where we get grocery, food and house hold items and these items are organized in aisles. Every supermarket has its own style. Similarly private operators who give Big data distributions (grocery Stores) offering a wide range of enterprises big data services with hadoop ecosystem (food and house hold products) to the typical Business Intelligent system. In this blog, we are going to move a trolley across various big data supermarkets.
What are the Big Data Distributions (Supermarkets)?
Cloudera Hadoop Distribution (CDH): A Leading Big data Platform for Apache foundation offering 100% enterprise-ready distribution of Hadoop and related project.
IBM Info sphere BigInsights : Bring the power of hadoop to the enterprise contains enhanced functionality and improved consumability, making it easier to use Hadoop leveraging existing skills, build big data applications and uncover insights in your data.
MapR: Delivers on the promise of Hadoop, making managing and analyzing Big Data a reality for more business users. MapR Distribution brings unprecedented dependability, speed and ease-of-use to Hadoop.
Horton works Data Platform (HDP) : Most stable and reliable Apache Hadoop distribution. Built and packaged by the core architects, builders and operators of Hadoop.
Karmasphere: Provides everything data professionals need to put Big Data to work. From predicting customer needs to tracking user behavior to optimizing development processes, Karmasphere was built to help you turn data into a strategic asset.
Where does the Hadoop ecosystem (products) fit into the typical BI system?
Layers of Big Data
- Big Data Store
- Big Data Processing Engine
- Big Data Integration
- Big Data programming
- Big Data Insight
- Big Data Management
Big Data Store
Hadoop distributed file System (HDFS) | HDFS is highly fault-tolerant and is designed to be deployed on cheaper hardware. Data is distributed across all nodes |
---|---|
Hive | Data warehouse system built on top of the hadoop for analyzing large dataset. |
HBase | Column-oriented database system for random, real-time read/write access |
Big Data Processing Engine:
Mapreduce | Programming model for processing a large Cluster of commodity machines. Mapreduce has Parallel processing power of distributed file system with large data set. |
Big Data Integration:
Flume | Distributed services that can collect data from different sources. |
Sqoop | Imports data from an RDBMS to hadoop and vice versa |
Hiho | Moving Data between any database and hadoop |
Chukwa | The Powerful tool for displaying, monitoring and analyzing results of the large collection of logs. |
Big Data Programming:
Pig | The Dataflow scripting language of high-level platform and capable of running mapreduce Engine. |
HiveQL | Querying Language to access the hive. |
Jaql | An executable program and a built-in annotator library provide the text analytics for Hadoop. |
Big Data Insight:
Mahout | The Core algorithms for implementing clustering, classification and filtering of large data sets. |
Hue | User interface framework and software development kit (SDK) for visual Hadoop applications |
Beeswax | User Interface framework for analyzing hive. |
Big Data Management:
Zookeeper | Coordination service for distributed applications. |
Oozie | The Workflow scheduler system for managing hadoop jobs. |
Whirr | The Cloud-neutral way to run services |
Big Data Distribution (Supermarkets) Comparison:
Features | MapR | IBM Biginsight | Cloudera | Hortonworks | Karmasphere |
---|---|---|---|---|---|
Name Node High availability | Available | Available | Available | Available | Integrated with MapR |
Connector to Social media | Flume | Social Data Accelerator | Flume | Flume | No |
Administering and Monitoring | Not Available | Available | Available | Available | Available |
On Cloud | Google Cloud Platform and Amazon EMR | IBM SmartCloud Enterprise | No | No | Karmasphere Analytics for EMR |
Machine Learning | Mahout | Machine data Accelerator | Mahout | No | No |
Web User Interface | Available | Available | Available | Available | Available |
Cluster Set up | Easy | Easy | Easy | Easy | Medium |
Product | MapR Edition | Infosphere biginsight | CDH | Hortonworks Data Platform | Karmasphere Studio |
For windows | No | No | No | yes | Yes |
Text analysis | INFA Hparser | Annotation querying language | No | No | Ability to use SAS, SPSS and R Analytic Models |
Related Blogs

A Recap of Databricks Data+AI Summit 2025: Strategic Insights for Your Data and AI Teams
- Data & Analytics

Enterprise Data Services: The Backbone of Modern Businesses
- Data & Analytics

Navigating Databricks’ Delta Lake Features and Type Widening
- Data & Analytics

Top 13 Data Science Services Providers: Bridging the Gap Between Data Capabilities and AI Strategy
- Data & Analytics

Ready to Pursue Opportunity?
Every outcome starts with a conversation