Data Mart Consolidation - Data Integration

Posted by Muneeswara C Pandian
December 29th, 2009

In the process of Data Mart Consolidation, the second area to focus after Data Model Consolidation is Data Integration (DI) Process. Let us define the following

  • Parent Data Mart (PDM) – The Data Mart which is to get extended with the elements from another ( Child ) Data Mart
  • Child Data Mart (CDM)– The Data Mart which is to be merged with the Parent Data Mart

Much of the information gathered from the consolidation of Data Model exercise will be leveraged for the consolidation of DI process. The key aspects that needs to be gathered in the process of consolidating DI processes are

  1. Collect the source, target and their relationship (mapping) details from the Data Integration processes from CDM and PDM
  2. Determine the overlapping DI processes between PDM and CDM. These would be scenarios of
    1. Tables that are sources for both the Data Marts
    2. DI processes that read data from PDM and loads into CDM
    3. Same data transformation logic being applied on the incoming data or the data present in the Data Mart
      This step helps determine the DI processes that can be eliminated or merged as part of consolidation
  3. Determine how many of the DI process in the CDM have source tables that are not accessed by the PDM.
    This step helps determine how many of the existing DI process from CDM needs to be added to the PDM DI process
  4. Determine the tables and elements in the CDM that are not used in the reporting layer. These tables can be intermediate staging or work tables as well tables.
    This step helps determine DI processes that can be possibly eliminated
  5. Determine how many of the DI processes in CDM are in a different technology like hand coded DI processes in SQL procedures, scripts or any other tools than that of PDM DI process
    This step helps determine how many of the existing DI process in CDM needs to be recoded or converted to the platform of the PDM DI process
  6. Determine any common DI logic being used between the CDM and PDM processes or within the CDM DI.
    This step helps in leveraging existing reusable component in PDM or build new reusable component that can be used in the CDM DI process as part of the consolidation
  7. Determine how many of the tables in CDM are currently manually maintained, like entry of a file arrival status or date of arrival to trigger other DI processes.
    This step helps determine scenarios of automation and eliminate any manual interaction for DI.
  8. Determine the key performance intensive long running DI processes in CDM
    This step helps determine DI processes that needs to tuned as part of the consolidation
  9. Determine the schedule dependency of CDM DI process and PDM DI process. Collect the current PDM DI server utilization details.
    This step helps in preparing an integrated schedule-dependency timeline of the PDM and CDM DI process within the available DI Server window. This is a very critical task.

Thanks for reading, let me know if you have come across other points that needs to be considered as part of data integration consolidation exercise. ..

Comments (0)