Sunday, February 14, 2016

Data Lineage/Data Modeling

With the advent of visualization and sophisticated reporting tools there is one aspect which is very important for analysis of data. For the reports/visualization to be valid and effective, it is very important to have the data behind the charts /graphs to be properly modeled and validated. When analyst look at the reports, they very soon look at what comprises the data element, where the data is sourced from. The analysts would also be interested in how the data has been transformed by the time it is shown in the reports. In order to have the reports/visualization to be effective it is very important to have a strategy for data lineage/data modeling. Typically data is sourced from various sources and finally they end up in data warehouse/data mart. During this process it very important to track how the data moves from one system to another. It is possible that data elements would move through the process with out any transformation. The other possibilities are that the data element could be transformed like there could be casting/trimming, extraction of a data element into multiple columns in the target data mart. The data elements could become part of a calculation or they could end up being candidates for  lookup tables. In order to capture all of the above data movements/transformations, it would be good to have a repository where the Metadata/data lineage can be stored. Such a metadata tool would be very handy for analysts/developers to really understand what makes up the data elements. It can also provide a valid context so that a effective visualization solution can be developed. The other aspect that help provide improved visualizations is also data modeling. Data modeling when done correctly with a proper framework can provide immense benefits to the overall data services visualization strategy. It is extremely important to know the data that is being worked on/extracted. There are products in the market that cover the data lineage piece of the puzzle. Pyramid Analytics, one of the more popular vendors that provide really good analytic tools has come with a product called BI office that has a Governed Data Discovery Platform. Please see the link below for all the details:
As per the website here is a overview of the Data governance feature in BI Office:
"BI Office provides you with self-service content creation in a centralized, shared paradigm – that also tracks the content life-cycle".
When companies/Business build a data services/data provisioning strategy, data governance needs to be an integral part of the whole vision. As the usage begins to scale and there is more demand for services, data governance becomes more and more critical to make the data services strategy successful.

