First Step in Knowing your Data - 'Profile It'
Chief Data Officer (CDO), the protagonist, who was introduced before on this blog has the unenviable task of understanding the data that is within the organization boundaries. Having categorized the data into 6 MECE sets (read the post dated May 29 on this blog), the data reconnaissance team starts its mission with the first step – ‘Profiling’.
Data Profiling at the most fundamental level involves understanding of:
1) How is the data defined?
2) What is the range of values that the data element can take?
3) How is the data element related to others?
4) What is the frequency of occurrence of certain values, etc.
A slightly more sophisticated definition of Data Profiling would include analysis of data elements in terms of:
- Basic statistics, frequencies, ranges and outliers
- Numeric range analysis
- Identify duplicate name and address and non-name and address information
- Identify multiple spellings of the same content
- Identify and validate redundant data and primary/foreign key relationships across data sources
- Validate data specific business rules within a single record or across sources
- Discover and validate data patterns and formats
Armed with statistical information about critical data present in enterprise wide systems, the CDO’s team can devise specific strategies to improve the quality of data and hence the improve the quality of information and business decisioning.