Sunday, September 27, 2020

Data Discovery Tools

In today's world, data is the new asset or some day it is the new oil. Whether it is an asset or the new oil depends on how much of valid information/insights are determined from the data assets. In order to do a viable data project or if the data has to be useful to the business, it is extremely important to understand the data. This where data discovery comes in, in the past few years there been a significant developments in this domain. Earlier doing data discovery was lot of grunt work with very manual processes and updating metadata information was very time consuming.One of the data discovery product that i have been looking at and closely following is Atlan, i had briefly mentioned in my earlier blog, link is I signed up for a onboarding trial with Atlan and the whole process getting on boarded was very smooth, folks from Atlan guided me through this process. I was very excited to see what the product has to offer, given the pain points we have in our current process.

Once I logged in i was presented with a google like search interface and there are options for Discover, Glossary, Classification, Access on the left side of the home page. In the search bar, you type in the data asset that you want to search, one critical step here is that you have connected Atlan to a public cloud provider like Amazon, Azure, in my case it was connected to a Snowflake DB/Warehouse. when you click the search button, all the data assets related to the search term are pulled up. The first i noticed is that it provides a snapshot of row count and number of columns. 

When you click on the table, you are presented with a preview window with data, column information on the right, below that you have classification, with owner and SME information. Seeing all of these information in one window provides lot of efficiency, helps one start getting some context around the data. In the column list, there is also description for each column which can be edited and updated. As a analyst/Business user this feature is extremely useful. Above the data preview window, you are provided with Query/Lineage/Profile/Settings options. Each one of these have deeper functionality when you click on them. The interface flows very logically and is set up in such a way that all operations related to data discovery and analysis can be done in this tool. I will write a follow up blog post as i explore the lineage aspect of the tool much more.

One of the key aspects of a data project to ensure a solid foundation is to have a very good Metadata/Glossary of the data points. This would contain Business entities/Logical Entities and relationships along with lineage. In Atlan, this is accomplished by using the Glossary option that is available on the left pane of the dashboard. As part of the Glossary once can add Categories and Terms. The categories can be used for setting up Business Value Chains, Business/Logical Entities,Sourcing,API,Provisioning which in turn will provide context around the data. The terms will be useful for identifying individual data elements, also can be linked back to the actual tables/column. The link feature is also available for Categories. Atlan also provides a method to bulk load Glossary items based on a template that can be downloaded for Categories and Terms.

More coming as i dig deeper into some of use cases...Keep Learning, Keep Growing.

No comments:

Post a Comment