Data Integration Challenge – Instantaneous Data Refresh, Bulk COMMIT

Posted by Muneeswara C Pandian
Comments (1)
June 30th, 2009

A ‘data load window’ is the time frame during which the changes to the data are applied in a BI system. A state of data inconsistency can exist during data load window if individual tables get refreshed independently.

In a typical data integration process we assume that the tables are not used for reporting during ‘data load window’. Within a ‘data load window’ the data across all tables are not in sync and keep varying until the end of the last ETL job, any user performing data analysis during this period can see strange actions on the data that he is analyzing. Generally different jobs load different tables across different time intervals.

Applying the data changes across all tables (that are used for reporting) in one-go without time gap is called ‘Instantaneous Data Refresh’.

One of the ways of ensuring instant reflection of data changes across multiple reporting tables is by using ‘Bulk Commit’. ‘Bulk Commit’, process leverages the concept that all DDL statements require a ‘COMMIT’ instruction to apply the changes on to the table data. Following are the steps involved in using ‘Bulk Commit’

  • For every ‘reporting’ table create a ‘temporary’ table with similar structure. Have a flag column in the ‘temporary’ table to hold the value ‘1’(Insert) and ‘0’(Update).
  • Have the ETL jobs insert records into these ‘temporary’ tables with a flag of ‘1’ or ‘0’. If already ETL jobs exist, retain all the ETL logic ‘AS IS’ and just change the target to write into the ‘temporary’ tables, also add a logic to determine Insert or Update of the source incoming records.
  • Build a single SQL script that will execute after all the ‘temporary’ tables have been loaded. This script has one INSERT, one UPDATE statement for each ‘reporting’ table and at the end of the script one COMMIT is executed. Say we have 10 ‘reporting’ tables, then we would have 10 ‘temporary’ table structures and one SQL script file with 10 INSERT & 10 UPDATE statements with a COMMIT as the last line.
  • After successful execution of the SQL script, have truncate statements for the ‘temporary’ tables

Having one COMMIT statement after multiple inserts and updates ensures that the data changes are reflected across the tables instantaneously.

Please share other means of instantaneous data refresh.

Comments (1)

srinivas o - November 17th, 2009

I have seen costly way of doing it, in which each table has a clone table and all reports points views on to say Table A_Clone and ETL say is pointing to Table A, Once ETL Loads are done, Views are changed to point to A, and A_Clone is synced with A. Report views are pointed back to A_Clone once both tables are in sync. I guess Teradata has a way to sync two tables efficiently.

Comments are closed.