BIG DATA – Beyond the Hype
If one were to run a Map Reduce job on a Hadoop Distributed File System (HDFS), consisting of all white-papers, articles and presentations created in the past couple of years on Business Intelligence & Analytics, to count the most frequently occurring 2-grams (2 words that occur together), BIG DATA would most certainly trump others. And so, instead of attemptingto define BIG DATA in this blog post, I would like to focus on the business value of BIG DATA.
For many years, BI practitioners have dealt with structured data – managed and harnessed it for insights. There have been remarkable improvements in business decision making and Analytics, as the domain has taken a great leap forward. Nevertheless, the focus has always been on structured data which is typically 20-25% of data generated by any organization. The rest 75-80% is composed of unstructured data (Text documents, Files etc.) and hitherto there has been no system / technique / platform to derive insights from this dataset.
With the advent of BIG DATA techniques (in which Hadoop and Map Reduce play a big part), businesses for the first time can confidently say that they can build the capability to manage large volumes of data (terabytes to exabytes to petabytes), different varieties of data (structured, semi-structured and unstructured), handle ever increasing data velocity and perform complex analysis that have high variability.
The diagram below illustrates the evolving architectural paradigm of combining structured and unstructured data analysis. The top half shows the unstructured data architecture while the bottom layer shows the BI architecture corresponding to structured data analysis. But the real value is in combining the insights from the top & bottom layers for a variety of use cases that truly enable organizations to compete on Analytics.
There are many interesting aspects in the diagram shown above and we at Hexaware have started working on proof of concepts for our customers, which combine the structured and unstructured world of data. Each of the components mentioned above will be explained in subsequent blog posts.
Thanks for reading. Please do provide your feedback.