Monday, October 5, 2020

Dataops - What is Data Ops...

 We live in a world of metaphors, there are new terms and metaphors which are heard everyday, with that it causes a lot of confusion, pressure and also some amount of chaos. It is important to filter out the noise and focus on what are needs of the business, customers/stakeholders. There are continuous attempts to streamline data projects, the reason being there is lot of unwanted costs, project delays and failed implementations. The whole purpose of data projects should be focused on value add for business or improving customer experiences and better integration of systems. In the Agile world, we have heard of Devops as a way to provide Continuous integration and Continuous deployment, similarly there emerged DataOps. What is DataOps: 

As defined by Dataops manifesto: https://www.dataopsmanifesto.org/:
Through firsthand experience working with data across organizations, tools, and industries we have uncovered a better way to develop and deliver analytics that we call DataOps. 
Very similar to agile manifesto, there are principles involved around DataOps. In order to facilitate Dataops there are tools available in the market today that try to tackle different aspects of DataOps. Some of the major areas in Dataops includes:

Data Quality - Very important, ability to perform simple to complex data quality checks at the time of ingestion of data. Data quality need to implemented as part of workflows where in the data engineer can track the records that were imported successfully and remediate records that failed.

Workflows - Ability to track data from sourcing to provisioning including the ability to profile, apply DQ Checks. Workflows need to be persisted.

Data Lineage - Ability to track how data points are connected from source systems all the way to provisioning systems.

Metadata Management - Categorizing all the different business, logical entities within a value chain and also have the ability to have a horizontal vision across the enterprise.

Data Insights - Based on the 3 aspects mentioned above, ability to generate valuable insights and provide business value for customers/stakeholders.

Self Service - Dataops also relies on building platforms where in different types of personas/users are able to handle their requests in a efficient manner.

Handle the 3 D's: They are Technical Debt, Data Debt and Brain Debt. I would like to thank Data Engineer/Cloud Consultant Bobby Allen for sharing this concept with me. Extremely important to handle this while taking up data projects.

Ability to build and dispose Environments - Data Projects rely heavily on data, the ability to build environments for data projects and quickly dismantle them for newer projects is the key.

It is very important to implement DataOps in terms of what is the value add for the business and how data will improve Customer Experience.

There tools that implement Dataops, some of the tools already in the market are: Atlan, Amazon Aethana.


No comments:

Post a Comment