Posted by Karthikeyan Sankaran
Comments (2)
March 24th, 2008

For as long as I can remember, the definition given for Metadata is “Data about Data”. We have all said this in interviews, heard it from candidates, seen it on presentations, and (almost) always nodded our heads in agreement.

In the transaction processing world, where “data-in” is the paradigm, the definition is precise. The databases store the business data in the relational format and the system tables / catalogs describe the structure of that data – the columns, type, size, etc. This data about the structure of business data is “Metadata”.

In the Business Intelligence world, that definition of metadata is incomplete. A more precise definition of metadata has two components:

Metadata in BI = “Data about Data” + “Information about Information”

The first component “Data about data” is “Technical Metadata” and is similar to the metadata in the OLTP world. Having said that, the technical metadata in BI is arguably more complex, as it not only encompasses the databases but needs to cover the ETL and Reporting tools as well. Each of the tools in the overall BI landscape has its own metadata and this data has to be looked at in a comprehensive fashion to understand data lineage etc.

Even among BI tools, there are different categories – Tools that expose its metadata completely, tools that gives an handle to its metadata thro’ pre-defined APIs and tools that do not allow any access to the metadata. Given the industry direction and the evolution of Common Warehouse Metamodel (CWM) compliance standards, it is only a matter of time before the tool architecture is designed to expose the technical metadata. CWM is a fascinating topic of its own and you can get a feel for it by visiting this website:

To me, as a BI practitioner, the second piece of the metadata puzzle is more interesting. “Information about information” aspect of metadata is “Business Metadata” and understanding it is crucial to implementing the BI vision in any enterprise.

As an analytical information consumer, there are 2 important requirements:

  1. Need direction to access the required analytical content

    • Where can I get Sales by Product for different locations over the last 2 years?
    • Am interested in Customer related Analytics. Where do I access it?
  2. Once the content is retrieved, need guidance on how to make sense of it

    • Report shows Forecasted Sales for next quarter in the chart. How is this value calculated?
    • Does the total inventory value displayed in the report include the Raw material inventory or does it exclude it?

Business metadata when properly organized should provide direction to both the points mentioned above.

Metadata management in BI deals with integration of technical and business data in a way that is useful for the organization. The challenge of metadata management becomes even more daunting when one considers both structured and unstructured data. Having said that, it is important for BI practitioners to understand the true nature of BI metadata and provide implementable solutions in their specific organizational context.

In my future posts, I would discuss this fascinating area of Metadata management, with its manifestation as “Technical and Business Metadata” in both structured and unstructured data domains.

Comments (2)

Karthikeyan Sankaran - May 8th, 2008

Hi Aaron, Thanks a lot for sharing your thoughts. You have brought up an interesting thought of having a feedback loop to operational systems not only for business data but for Metadata as well. This would probably ensure that incorrect data can be corrected at the point of entry itself. Please do keep reading. Thanks once again.

Aaron Johal - May 5th, 2008

Nice one, I like it a lot! I have been highlighting this difference in the classroom for many years and agree completely with the fact that the concept of Meta Data, although useful enough as it is in the relational world, has to be extended to cover BI requirements. I would go further and say that the Informational Meta Data that you mention should be looped back to the Meta Data in relational systems to control what enters and comes through the data flow chennels in the future! Warm regards, Aaron Kindarcats Limited

Comments are closed.