Data Integration Checklist – Environment Setup & Process Design

Data & AI Solutions

March 19, 2009

Following are the key points that needs to considered when setting up a Data Integration environment.

Data Integration Environment Setup

  • Repository setup and folder structures to hold the development objects (code) like transformations/mappings/jobs.
  • Coding standards and development process.
  • Document templates for low level design specifications and for capturing test case & results.
  • Version management process of the objects.
  • Backup and restore process of the repository.
  • Code migration process to move the object from one environment to the other like from development to the production environment.
  • Recommended configuration variables like commit interval, buffer size, log file path etc
  • User group and security definition
  • Integration of the metadata of the database with the DI metadata and that of the DI metadata with the reporting environment
  • Process for Impact Analysis for change request
  • Data Security needs for accessing the production data and the process of data sampling for testing
  • Roles and Responsibilities of the environment users like Administrator, Designer etc

Data Integration Process Design

  • What are the different data sources and how are they to be accessed.
  • How the data are provided by the source systems, is it incremental or full feed, how to determine the incremental records.
  • What are the different target systems and how would the data be loaded
  • Validation and reconciliation process for the incoming source data
  • Handling late arriving dimension records
  • Handling late arriving fact records
  • Having dynamism in the validation and transformation process
  • Error handling process definition
  • Table structures for holding the error data and the error messages
  • Process control or audit information gathering process definition
  • Table structures for holding the process control data
  • Determining reusable objects and its usage
  • Template creation for commonly used logics like error handling, SCD handling etc.
  • Data correction and reentry process
  • Metadata capture during the development process
  • Means of scheduling
  • Initial data load plan
  • Job failure and restartability methods

Related Blogs

Every outcome starts with a conversation

Ready to Pursue Opportunity?

Connect Now

right arrow

ready_to_pursue
Ready to Pursue Opportunity?

Every outcome starts with a conversation