Friday, October 16, 2020

Workflow, Data Masking - Data Ops

 Dataops is becoming more prevalent in today's data driven projects, due to the speed at which these projects need to be executed and also be meaningful at the same time. There are tools in the Dataops space that are provide lot of different features, companies like Atlan, Zaloni are very popular in this space, in fact Atlan was named in the Gartner 2020 Data Ops Vendors list. Now coming to the different features needed in these tools, there are concepts that are becoming very important, those are Data Masking and Workflows. It is very well know that in Data Driven Projects testing with valid subsets of data becomes very important. One of the biggest challenges faced today in Data Projects is the availability of test data at the right time in order to test functionality, usually it takes a process to get test beds ready.

With Dataops tools, one of the features that is promised is Data Masking/Obfuscation which means the production data could be obfuscated and be made available quickly for testing. In the data masking process there is this concept of identifying data elements that are categorized as NPI or Confidential and obfuscating those elements. Dataops tools provide mechanism where masking can be done very quickly, this really helps the process of testing in test environments. The impacts become more visible when one is working on major projects where testing has to be done through multiple cycles and also if one is in a agile environment. One of the leading Data Analytics expert Sol Rashidi mentions about 3 S's - Speed, Scale and Shareability, these are what is expected from Data projects apart from providing Business Value. In order to Satisfy the above requirements, Data masking being made available in data Ops tools is very welcoming indeed.

The other concept i wanted to discuss here is the concept of Workflows in Dataops. When we look at the data flow in general, there are source systems, data is collected into a HUB/Datawarehouse and then data is provisioned out to different applications/consumers. In order to achieve this typically lot of time is spent in developing ETL flows, moving data into different databases and curate the data to be provisioned. This involves a lot of time, cost and infrastructure. In order to alleviate these challenges, Dataops tools today introduce a concept called Workflows.  The main concept here is to automate the flow of data from source to target, in addition to that also execute data quality rules, profile the data and prepare the data for consumption to various systems. Workflows do emphasize the importance of data quality checks which are much more than data validations, these can be customized to verify the type of data that need be to be present with each data attribute. When performing data quality checks in the workflow, the tools also provide the ability to set up Custom DQ Rule and provides Alerts which can be sent to teams who provide the data. There are a couple of vendors who offer the Workflow functionality, they are Zaloni Arena Product and Atlan has it in their Trail offering, hope to be in production soon. Working with quality is fundamental for any Data project, building a good framework with dataops tools provides the necessary Governance and Guardrails. Such concepts will go a long way in setting up quality data platforms which are very essential for AI and Machine Learning Initiatives.

Vendor Links:

No comments:

Post a Comment