Can everyone fit in the big data role?

Data & Analytics

April 17, 2014

There is so much hype around the big data. Everywhere people talk about Hadoop and this word has reached many in the world in quick time. Every organization puts in lot of effort, time and money to build capabilities in big data skills.

The question is can everyone fit in the big data role? Or only the chosen elite group will fit?

There was lot of confusion on who can or who is eligible to develop big data skills. Since the entire Hadoop framework is built on top of Java, there is still a strong perception that only Java resources can get to know this technology easier and faster. Java knowledge is not mandatory to be skilled in Hadoop related skills unless otherwise one wants to be writing map reduce codes for various applications.

Let me confine myself to Business Intelligence, the technology I am involved with to give answer to the above question. My answer is a big YES. Every BI resource can fit in. My request to other technology folks to expand this thought process.

I may be questioned on the how part of it. Let me explain.

Hadoop end of the day is data storage and processing environment. Hence, all DW/BI resources who work with data naturally fit in. Many top BI tools give connectivity to Hadoop. Hence it is the additional knowledge of how storage happens inside Hadoop will make BI resource fit in the big data role. And of course the basic operations a BI resource needs to do with respect to Relational DBs have to be done in Hadoop.

Big data is about many Vs’ (Volume, Velocity, Variety, Veracity… V s’ increasing in fact). Hadoop is the framework to store and process big data. Storing part is taken care by HDFS (Unstructured data storage), HIVE (Similar to Data warehouse), HBase (For both structured and unstructured data) and there are other NoSQL data stores. Processing part is taken care by Map-reduce since Hadoop only understands map-reduce, meaning every single operation is converted into a map-reduce job whether it is a simple I/O operation or a complex calculation.

Hope my answer is convincing to all.

And here are few roles and skills a BI resource can look for and get expertise. For all the following, knowledge of big data distributions, NoSQL stores and basic Hadoop operations are essential.

Big data solution Architect ( Knowledge in Big data tools and BI tools which can talk to Hadoop )
Big data ETL Developer ( Knowledge in ETL tools with Hadoop connectivity)
Big data report developer ( knowledge in reporting tools with Hadoop connectivity)
Data scientist ( In depth knowledge in data mining , predictive analysis, text mining and domain knowledge )
Big data sales professional ( Exposure to all the above mentioned skills )

Where do you think you fit in, in the big data roles?

I would be very happy to see your feedback, comments or a different point of view.