Accelerate the Data Integration Layer development through the Reporting Layer

Posted by Muneeswara C Pandian
Comments (4)
April 14th, 2010

In a typical data warehouse setup, we would first extract, cleanse and bring the required base data into the data warehouse. We would then transform or aggregate the base data into additional derived tables for easier reporting; these are processes that are optional but are done to get the data ‘report friendly’.

The decision to have calculation logic embedded in the data integration (DI) layer or the reporting layer can be primarily driven by two factors

Report execution performance and simplifying report development

When the DI layer pre calculates and makes the values directly available for the reporting process, then there is a performance improvement in the report delivery. But this does not provide the flexibility to change any parameters in the calculation formula in run time during analysis. Cube based reporting layer also falls under this case.

Flexibility for users to manage the calculations during analysis

When all calculation logic is embedded within the reporting layer then there is a lapse in the report delivery since the report server spends time in performing these calculations during runtime.

In many the aggregation or transformation logic required to get the data ‘report friendly’ involves different levels of aggregation functions which could be counts, sum, percentage etc across multiple dimensions with different set of records from the same fact table. In summary we are looking at good amount of business logic to be built into the data integration process to prepare the data for the reporting. This directly impacts the complexity of the data integration process and the amount of time to be spent in testing the data integration process, especially during the first release of the data warehouse, when the knowledge about the data is an evolving process.

Recently observed that during the data warehouse development process we can get the ‘report friendly’ requirements started at the reporting layer level and then gradually move these logics into the DI layer based on need.

We had 100+ key calculations to be developed in a 4 weeks timeframe, getting them calculated and tested was a challenge. We decided to build all the logic of calculations from the base data in the reporting layer, we went through multiple iterations to finalize the calculation logic with the users, but the process was much easier to handle.

Reporting layer provided a much easier flexibility of debugging and verifying each calculation, we were able to add a calculation, run the result, pass it to business the sample outputs and validate…all of this was done within the reporting layer team…with very less co-ordination with the DI team. Data Integration process debugging is time consuming, we will need to fall back and forth between the DI platform and the database platform.

Reporting layer provided the flexibility to fix any issues identified by the users and get back to them immediately; it would have not been easier through data integration layer.

Was easy enough to generate the SQL query from the reporting interface and add that logic to the DI process for pre-computing the calculations for report better performance

In summary reporting layer provided

  • Flexibility in debugging the calculation logic
  • Easier to add calculations one by one and validate the output with the users
  • Quicker to fix any issues identified by the users
  • Lesser dependency on the DI and database team
  • Finally helped in deciding what calculations to move to the DI layer, also generated the SQL Query for the DI team to embed in the DI layer

Thanks for reading, let me know your thoughts on building DI logic specific for the reporting layer… ‘report friendly’.

Comments (4)

Muneeswara C Pandian - August 25th, 2010

Thanks Peggy for reading, yes you are right that for an environment that has multiple reporting tools being used, all the metrics calculation should be built as part of data integration so that we maintain one version of metrics calcutaion. But in a scenario where we need to build a 100+ metrics in 4 weeks timeframe, if we are to start this process from the DI layer then the amount of time spent in user testing , incorporating user feedbacks, debugging is much higher, all of these can be done in a much shorter time in the reporting layer. In an environment with multiple reporting tools, get these metrics tested in one of the reporting layers, once tested and validated by users move the logic to the DI layer which can be be made available for all the reporting tools. Found this process of getting to build metrics first on the reporting layer and then moving it to DI layer much faster , than the traditional way of getting the metrics calculation first in the DI layer.

Peggy Mathias - August 20th, 2010

You are presuming that the business is using only one tool for reporting - not necessarily a 'given' in complex environments. If mulitple reporting platforms are used to satisfy different business constituents then the calculations would have to be rebuilt on each of the reporting platforms. This is duplicate work to develop, to ensure the calcuations are performed with the exact same results and that they calculations are kept in sync over time.

Muneeswara C Pandian - April 29th, 2010

Thanks Jagdish for reading. The experience that we went through in the development process was that, it was easier to build the complex reporting requirement calculations in the reporting layer and then moving it to the DI layer as required.

Jagdish Suryawanshi - April 19th, 2010

Hi Pandian, It’s really a good analysis. Thanks for the sharing knowledge. It depends on who are the report users. Generally most of the users are from higher management then absolutely they are not interested in the transactional details of their business. They are mostly interested in the summary levels and multidimensional analysis. And these requirements can be fulfilled in DI only. It also depends on the user requirements whether to go for the DI or Reporting Layer. If the requirement is for the standard/canned reports then the DI is better and if user have some ad-hoc report requirements then Reporting layer is better. In most of the cases the requirements from the user are very clear and they need performance based reports. And if we are using the Reporting layer i.e. most of the calculations are done at the report level then definitely there will be performance issues. And after all the decisions are based on the summary level. Thanks Jagdish.

Comments are closed.