DI Challenge - Handling Files of different format with same subject content
In a Data Integration environment which has multiple OLTP systems existing for same business functionality one of the scenarios that occur quite common is that of these systems ‘providing files of different formats with same subject content’.
Different OLTP systems with same functionality may arise in organizations like in case of a bank having its core banking systems running on different products due to acquisition, merger or in a simple case of same application with multiple instances with country specific customizations.
For example data about same subject like ‘loan payment details’ would be received on a monthly basis from different OLTP systems in different layouts and formats. These files might arrive in different frequency and may be incremental or full files.
Always files having same subject content reach the same set of target tables in the data warehouse.
How do we handle such scenarios effectively and build a scalable Data Integration process?
The following steps help in handling such situations effectively
• Since all the files provide data related to one common subject content, prepare a Universal Set of fields that would represent that subject. For e.g., for any loan payment subject we would have a set of fields identified as a Universal Set representing details about the guarantors, borrower, loan account etc. This Universal Field list is called Common Standard layout (CSL)
• Define the CSL fields with a Business Domain specialist and define certain fields in the CSL as mandatory or NOT NULL fields, which all source files should provide
• Build a set of ETL process that process the data based on the CSL layout and populates the target tables. The CSL layout could be a Table or Flat File. In case the CSL is table define the fields as character. All validations that are common to that subject are built in this layer.
• Build individual ETL process for each file which maps the source files fields to the CSL structure. All file specific validations are built in this layer.
Benefits of this approach
• Conversion of all source file formats to CSL ensured that all the common rules are developed as reusable components
• Addition of a new file that provides same subject content is easier, we need to just build a process to map the new file to the CSL structure