Data Integration Challenge – Understanding Lookup Process –II
Most of the leading products like Informatica, DataStage support all the three ways of lookup process in their product architecture. The following table lists ‘when to use’ and ‘when not to use’ the particular type of lookup process.
Advantage Cache Lookup:
The advantages of using cache file based lookups are that
- Fields that are present in the cache file is only that is needed by the lookup process so when querying the cache file the return would be faster as compared to the lookup table that might have more fields present
- The data structure of the cache file would be designed in such that the query from the ETL server is easily understood without any additional layer like SQL
Though in general it is said in user manuals that usage of cache files is best suited for low volume of lookup but in practical scenarios I have seen cache files are more valuable in terms of performance when the lookup records are huge.
Dynamic Cache: We have the concept of Dynamic Cache in Informatica and as well in Hash files of Datastage where you can Insert/Update or delete records from these cache file. The feature of updating the cache files is useful when we want to keep the cache file and the lookup table in sync.
Handling Multiple Return Records: Handling the return of multiple records by a lookup process is still a challenge not implemented in any of the leading products – limited to my knowledge. Probably in release 9 Informatica’s lookup can have a parameter for defining the number of records to return as an array like in its Normalizer transformer.
In Part III we shall see some of the things to be considered for better performance when using the lookup process