Let me start with the defintion of Predictive Analytics as used in literature – “The nontrivial extraction of implicit, previously unknown and potentially useful information from data”. If that doesn’t sound esoteric enough, you are probably more advanced than what this post gives you credit for!
For a BI practitioner, it is important to get an understanding of Predictive Analytics (also known as Data Mining) as this subject definitely deserves a place in the wide spectrum of Business Intelligence disciplines. BI at a broad level is about optimizing business through “Hindsight, Insight and Foresight”. Predictive analytics adds the powerful “Foresight” part to business decision making.
Most BI practitioners tend to equate statistics with predictive analytics and this post explains why such a view is inaccurate. To understand this let’s start at the very beginning (a la Alice in Wonderland). Broadly, this world is divided into 2 types of systems:
- Physical Systems – Has causality and hence can be modeled mathematically with relative ease
- Human Behavioral Systems – Lacks causality and can be modeled only with specialized techniques
Predictive analytics for business decision making is all about modeling human behavioral systems.
Why Traditional Statistics is insufficient?
Though the entry into predictive analytics requires that we understand the implications of traditional statistical analysis, statistics by itself is insufficient in the business context. Traditional statistical analysis allows us to understand the general group behavior and is primarily concerned with common behavior within the group – the central tendencies.
In business we generally develop models to anticipate human behavior of some type. Human behavior is inconsistent, lacks causality and distributions based on human behavior almost always violate the assumptions of traditional statistical analysis (like normal distribution of data, stability of mean and standard deviation etc). The strength of data mining comes from the ability of the associated techniques to deal with the tails of the distributions, rather than the central tendencies, and from the techniques’ ability to deal with the realities of the data in a more precise manner.
In the realm of predictive analytics, we are concerned with modeling human behavior and hence are interested with the tail of our distribution – small percentage of the population that responds to a campaign, commits a fraud, leave our business or purchase the next service.
Though there are specialized techniques used for Predictive Analytics (viz. Non-linear statistics, Induction Algorithms, Cluster Analysis, Neural Networks to name a few), a BI practitioner is only expected to appreciate its usage in different business situations, prepare and model data as required by the tools and interpret the results correctly (a much less daunting task indeed!)
Typically the model development process involves the following steps – a) Define Project, b) Select Data, c) Prepare Data, d) Transform Variables, e) Process Model, f) Validate Model, g) Implement Model. I will explain these steps in more detail in subsequent posts.
Fundamentally, an end-to-end BI view requires the practitioner to learn the concepts around statistics and predictive analytical techniques as available in tools (like say SQL Server Analysis Services) in addition to their technology bag of tricks around data integration, data modeling and OLAP.
Thanks for reading. Please do share your thoughts.