Let me reiterate – Big Data is the ‘New Normal’ in Business Intelligence. Now, What is this Big Data?
Before that let me take a dig at this fashionable phrase called the ‘New Normal’ – Mr Mohammed El-Erian, CIO of Pimco, first introduced the phrase New Normal and it has since been used by many people to signify a significant shift in the way things ‘were’ as compared to how things ‘will be’ in the future. Here’s an instance of its usage by Microsoft CEO Steve Ballmer in one of his periodic public e-mails.
In the context of BI, Big Data has come to include all the systems & process around data generation, collation, management, control & usage. Data generated by business transaction systems have been increasing rapidly and with the advent of social media (Facebook, Twitter, Blogs etc.), it has exploded exponentially (a little out of control, if I may add!). We are clearing entering the era of petabytes & exabytes as the ‘New Normal’ for data management systems.
Here goes some of the more famous very large data warehouses:
• eBay has a 6 1/2 petabyte database running on Greenplum and a 2 1/2 petabyte enterprise data warehouse running on Teradata
• Facebook has a 2 1/2 petabyte datawarehouse running on Hadoop/Hive
• Walmart has a 2.5 petabytes warehouse, Bank of America has 1.5 petabytes, Dell with 1 petabyte – All running on Teradata
• Yahoo, Fox Interactive Media, TEOCO (which runs outsourced DWs’ for top US telcos) are all in the hundreds of terabytes range
Since data management forms the core of analytical systems, it is important for BI practitioners to reset (or should I say, re-engineer) their thought process around managing data. Thinking at the scale of petabytes & beyond does alter certain preconceived notions around BI systems for many of us. For example, larger data sets require that we distribute the data among many units rather than just distributing the workload. Our notion of reliability, recoverability, consistency, scalability etc. can get turned on its head with the requirement to handle data in the petabyte and exabyte range.
Innovations will continue to happen across multiple dimensions to help tame this Big Data. Given below are some dimensions of change I could think of:
1) New data storage & manipulation techniques would continue to unfold – Ex: Hadoop & MapReduce, Columnar databases, MPP architectures etc.
2) Divide & Conquer data – Organizations will develop their business architectures around distributing data across multiple platforms (on-demand & on-premise) to make sense out of them.
3) In-Memory Analytics would help business users in analyzing large datasets rapidly – Faster & More powerful analytics with the proliferation of 64-bit processor families and In-memory based BI tools like BO Explorer, Qlikview, Microsoft PowerPivot etc.
Am sure that there are many more interesting ideas to manage & make sense out of Big Data. Please do share your thoughts.