Big Data Supermarket

Data & AI Solutions

August 26, 2013

As we all know that a Supermarket is something big where we get grocery, food and house hold items and these items are organized in aisles. Every supermarket has its own style. Similarly  private operators who give Big data distributions (grocery Stores) offering a wide range of enterprises big data services with hadoop ecosystem (food and house hold products) to the typical Business Intelligent system. In this blog, we are going to move a trolley across various big data supermarkets.

What are the Big Data Distributions (Supermarkets)?

Cloudera Hadoop Distribution (CDH): A Leading Big data Platform for Apache foundation offering 100% enterprise-ready distribution of Hadoop and related project.

IBM Info sphere BigInsights : Bring the power of hadoop to the enterprise contains enhanced functionality and improved consumability, making it easier to use Hadoop leveraging existing skills, build big data applications and uncover insights in your data.

MapR: Delivers on the promise of Hadoop, making managing and analyzing Big Data a reality for more business users. MapR Distribution brings unprecedented dependability, speed and ease-of-use to Hadoop.

Horton works Data Platform (HDP) : Most stable and reliable Apache Hadoop distribution. Built and packaged by the core architects, builders and operators of Hadoop.

Karmasphere: Provides everything data professionals need to put Big Data to work. From predicting customer needs to tracking user behavior to optimizing development processes, Karmasphere was built to help you turn data into a strategic asset.

Where does the Hadoop ecosystem (products) fit into the typical BI system?

Layers of Big Data

  • Big Data Store
  • Big Data Processing Engine
  • Big Data Integration
  • Big Data programming
  • Big Data Insight
  • Big Data Management

Big Data Store

Hadoop distributed file System (HDFS) HDFS is highly fault-tolerant and is designed to be deployed on cheaper hardware. Data is distributed across all nodes
Hive Data warehouse system built on top of the hadoop for analyzing large dataset.
HBase Column-oriented database system for random, real-time read/write access

 

Big Data Processing Engine:

Mapreduce Programming model for processing a large Cluster of commodity machines. Mapreduce has Parallel processing power of distributed file system with large data set.

 

Big Data Integration:

Flume Distributed services that can collect data from different sources.
Sqoop Imports data from an RDBMS to hadoop and vice versa
Hiho Moving Data between any database and hadoop
Chukwa The Powerful tool for displaying, monitoring and analyzing results of the large collection of logs.

 

Big Data Programming:

Pig The Dataflow scripting language of high-level platform and capable of running mapreduce Engine.
HiveQL Querying Language to access the hive.
Jaql An executable program and a built-in annotator library provide the text analytics for Hadoop.

 

Big Data Insight:

Mahout The Core algorithms for implementing clustering, classification and filtering of large data sets.
Hue User interface framework and software development kit (SDK) for visual Hadoop applications
Beeswax User Interface framework for analyzing hive.

 

Big Data Management:

Zookeeper Coordination service for distributed applications.
Oozie The Workflow scheduler system for managing hadoop jobs.
Whirr The Cloud-neutral way to run services

 

Big Data Distribution (Supermarkets) Comparison:

Features MapR IBM Biginsight Cloudera Hortonworks Karmasphere
Name Node High availability Available Available Available Available Integrated with MapR
Connector to Social media Flume Social Data Accelerator Flume Flume No
Administering and Monitoring Not Available Available Available Available Available
On Cloud Google Cloud Platform and Amazon EMR IBM SmartCloud Enterprise No No Karmasphere Analytics for EMR
Machine Learning Mahout Machine data Accelerator Mahout No No
Web User Interface Available Available Available Available Available
Cluster Set up Easy Easy Easy Easy Medium
Product MapR Edition Infosphere biginsight CDH Hortonworks Data Platform Karmasphere Studio
For windows No No No yes Yes
Text analysis INFA Hparser Annotation querying language No No Ability to use SAS, SPSS and R Analytic Models

Related Blogs

Every outcome starts with a conversation

Ready to Pursue Opportunity?

Connect Now

right arrow

ready_to_pursue
Ready to Pursue Opportunity?

Every outcome starts with a conversation