Data Integration Challenge – Instantaneous Data Refresh, Other Options: Partition Exchange and Synonym

Posted by Muneeswara C Pandian
Comments (5)
July 29th, 2009

There are three options by which instantaneous data refresh across multiple tables can be achieved, they are

  1. Bulk COMMIT
  2. Partition Exchange
  3. Synonym

Earlier had discussed on what is instantaneous data refresh and the option of using Bulk COMMIT, now shall detail on the other two options. The following points are in reference to Oracle, but would be applicable to other databases as well.

‘Partition Exchange’: In this option the database feature to exchange partitions between tables is being leveraged. The steps involved in designing instantaneous data refresh using table partitions are

  1. Create two similar table structures, one a ‘partition’ table and the other one a ‘non-partition’ table. Both the tables will hold data in it. The ‘partition’ table will be accessed by the users for data analysis and reporting
  2. Have the data load process built against the ‘non-partition’ table
  3. On successful load run a script to exchange the ‘non-partition’ table data with the ‘partition’ table. This switch will happen instantaneously without any definite downtime

‘Synonym’: In this option the database feature Synonym is being leveraged, Synonyms decouple the underlying table to the applications that access it. The steps involved in designing instantaneous data refresh through synonym definition are

  1. Create two ‘data’ schemas and have same table structures with same table names in both the schemas. Both the schemas will hold the data.
  2. Create a ‘synonym’ schema and have Synonyms pointing to one of the ‘data’ schemas. The Synonym schema will be accessed by the users and other applications for data analysis and reporting. The Synonym schema will have only the definition and will not hold the data.
  3. Always the ETL process will run against the schema that is not being used for reporting, hence the ETL process will need to be designed to run against the two ‘data’ schemas alternatively
  4. After successful load, run a script to re-point the Synonym to the recently loaded ‘data’ schema. This switch will happen instantaneously without any definite downtime

Next time, shall compare these three options and determine the best option….

Comments (5)

srinivas o - November 17th, 2009

This has been very practical and sensible blogging especially for ETL and BI. Thanks Munees and all!!

Muneeswara C Pandian - November 7th, 2009

Thanks Kumaran for your observations. Yes for rolling partitions, the additional disk space required can be equivalent to that of one partition. In general we can say that the additional disk space required should be equivalent to that of the partition against which it has to be exchanged. In scenarios where the table is not partitioned on months, we will need to plan for disk space that is equivalent to that of an existing table.

Kumaran Karthikeyan - October 24th, 2009

Hi Munees, It was a good article. Choosing an approach is definitely based on a given circumstances. Partition Exchange is usually the sought after approach. I beg to differ with you on the "Additonal Disk Space requirement" for Partitioned tables. The article says the additional space requirement is twice the size of the actual table. That is not true. At most it holds one extra partition data and not the entire table. Let me explain a little bit more on the Partition Exchange approach. The non-partitioned table that was mentioned in the article is called staging table. The data is (direct parallel otherwise bulk commit) loaded into the staging table and the indexes are rebuilt and the constraints (if any) are enabled on the staging table. Then make a partition exchange call and the data is swapped in seconds. The benefit of this approach is that the whole process (pre load, load & post load) happens in the background on the staging table without any downtime to the base (actual) table. Any issues during pre-load/load/post-load affects only the staging table and not the actual table. Coming to the Additional Disk Space requirement, let us say a fact table is partitioned monthly as the process is a monthly processing and the residential criteria for data is 12 months. So we will have 12 rolling paritions on the fact table. Assuming we are in the 13th month of processing and the 13th month data is loaded into Staging table. Now when we make a partition exhange call the data between the 1st partition and the staging table is swapped. So the staging table holds only the 1st month data and not the entire data of the fact table. I would like to question Karthik on the approach of having two partitions - i.e., one for ETL and other for BI reports. What benefit is derived through this approach over the synonym? I would rather have 2 physical non-partitioned tables and switch the tables through synonyms. Note: Staging table can be a partitioned table provided the base table is sub-partitioned. Thanks Kumaran PS : Munees, By the way we were collegemates.

Muneeswara C Pandian - July 31st, 2009

Thanks Karthik, your input on one other way of using table partitions is also a good option, i shall include it in my next comparison table.

Karthik Srinivasan - July 30th, 2009

Hi Pandian, I just thought of sharing my thoughts - This is really an interesting and informative topic. Thanks for writing on this subject. In one of my earlier projects in our BI, we had handled the instantaneous data refresh in a slightly different way. 1)Create one table with two partitions and a partition identifier (partition flag) column. 2)One of the partitions will be used by reporting system. 3)One of the partitions will be used by the ETL load process. 4)Have one logic at the end of ETL load process to switch the partitions. 5)Have one logic at the reporting system to switch the partitions based on the partition flag column. This is another way of providing 24 * 7 availability of the reporting environment without any outage. Thanks and regards, Karthik

Comments are closed.