Data Integration Checklist – Environment Setup & Process Design

Data & AI Solutions

March 19, 2009

Following are the key points that needs to considered when setting up a Data Integration environment.

Data Integration Environment Setup

Repository setup and folder structures to hold the development objects (code) like transformations/mappings/jobs.
Coding standards and development process.
Document templates for low level design specifications and for capturing test case & results.
Version management process of the objects.
Backup and restore process of the repository.
Code migration process to move the object from one environment to the other like from development to the production environment.
Recommended configuration variables like commit interval, buffer size, log file path etc
User group and security definition
Integration of the metadata of the database with the DI metadata and that of the DI metadata with the reporting environment
Process for Impact Analysis for change request
Data Security needs for accessing the production data and the process of data sampling for testing
Roles and Responsibilities of the environment users like Administrator, Designer etc

What are the different data sources and how are they to be accessed.
How the data are provided by the source systems, is it incremental or full feed, how to determine the incremental records.
What are the different target systems and how would the data be loaded
Validation and reconciliation process for the incoming source data
Handling late arriving dimension records
Handling late arriving fact records
Having dynamism in the validation and transformation process
Error handling process definition
Table structures for holding the error data and the error messages
Process control or audit information gathering process definition
Table structures for holding the process control data
Determining reusable objects and its usage
Template creation for commonly used logics like error handling, SCD handling etc.
Data correction and reentry process
Metadata capture during the development process
Means of scheduling
Initial data load plan
Job failure and restartability methods