Strategies for Testing Data Warehouse Applications

Posted by Vinod G
March 29th, 2012


There is an exponentially increasing cost associated with finding software defects later in the development lifecycle. In data warehousing, this is compounded because of the additional business costs of using incorrect data to make critical business decisions. Given the importance of early detection of software defects, let’s first review some general goals of testing an ETL application:

Below content describes the various common strategies used to test the Data warehouse system:

Data completeness: Ensures that all expected data is loaded in to target table.

1. Compare records counts between source and target..check for any rejected records.
2. Check Data should not be truncated in the column of target table.
3. Check unique values has to load in to the target. No duplicate records should be existing.
4. Check boundary value analysis (ex: only >=2008 year data has to load into the target)

Data Quality:

1.Number check: if in the source format of numbering the columns are as xx_30 but if the target is only 30 then it has to load not pre_fix(xx_) .. we need to validate.
2.  Date Check: They have to follow Date format and it should be same across all the records. Standard format : yyyy-mm-dd etc..
3. Precision Check: Precision value should display as expected in the target table.
Example: In source 19.123456 but in the target it should display as 19.123 or round of 20.
4.  Data Check: Based on business logic, few record which does not meet certain criteria should be filtered out.
Example: only record whose date_sid >=2008 and GLAccount != ‘CM001’ should only load in the
target table.
5. Null Check: Few columns should display “Null” based on business requirement
Example: Termination Date column should display null unless & until if his “Active status”
Column is “T” or “Deceased”.
Note: Data cleanness will be decided during design phase only.

Data cleanness:

Unnecessary columns should be deleted before loading into the staging area.
1.  Example: If a column have name but it is taking extra space , we have to “trim” space so before loading in the staging area with the help of expression transformation space will be trimed.
2. Example: Suppose telephone number and STD code in different columns and requirement says it should be in one column then with the help of expression transformation we will concatenate the values in one column.
Data Transformation: All the business logic implemented by using ETL-Transformation should reflect.

Integration testing:

Ensures that the ETL process functions well with other upstream and downstream processes.


1.  Downstream:Suppose if you are changing precision in one of the transformation “column”, let us assume a “EMPNO” is column having data type with size 16, this data type precision should be same for all transformation where ever this “EMPNO” column is used.
2.  Upstream: If the source is SAP/ BW and we are extracting data there will be ABAP code which will act as interface between SAP/ BW and map where there source is SAP /BW and to modify existing mapping we have to re-generate the ABAP code in the ETL tool (informatica)., if we don’t do it, wrong data will be extracted since ABAP code is not updated.

User-acceptance testing:

Ensures the solution meets users’ current expectations and anticipates their future expectations.
Example: Make sure none of the code should be hardcoded.

Regression testing:

Ensures existing functionality remains intact each time a new release of code is completed.


Taking these considerations into account during the design and testing portions of building a data warehouse will ensure that a quality product is produced and prevent costly mistakes from being discovered in production.

Comments (0)