Wednesday, November 18, 2020

Data Cloud

 One of the more commonly used buzz words these days is Data Cloud, it has been used as a marketing term mainly in the Cloud Domain across different business/organizations. There is a concept underneath the word Data cloud, it is mainly aimed at having data available in/migrated in the public cloud offerings such as Amazon, azure, Google Cloud. One of the key projects that have been undertaken by lots of organizations across different Business is how does one have data available in the cloud without compromising on the security aspect of data. I had the opportunity to attend listen to the data cloud summit 2020 organized by Snowflake. It was a virtual event organized by Snowflake where features of the snowflake was discussed in different sessions. There were also use cases presented by different Customers, Vendor Partners on how they are utilizing snowflake for the data projects and how much impact has this product had on their Business. There were some interesting points that i had picked up from different Sessions, i am listing them below. They cover a variety of topics related to Data.

1. Compute/Storage: Snowflake Separates compute from Storage, this is one of the main concepts in this product which has been highlighted by lot of customers. How this concept helps them in their daily data operations and business.
2. Scalability - Ability to ingest multiple workloads, this is a common requirement across all customers.
3. Simplify: Simplification of the Data Pipeline. How can one get the raw data and turn them into actionable insights quickly, this is called the lapse time. One of the questions raised is that does all the transformation of the data happen during early hours in the morning? Can this be spread out or done in real time?
4. Data Silos: Breaking down Data silos is a significant effort that is being undertaken by different organizations. Data Silos has a direct/indirect impact on cost and efficiency in a very negative way. One of the reasons for using a product like Snowflake is to break down the data silios, having data data in one place. This would allow better understandability ad searchability for the data in an organization.
5. Proof of Value: Data cloud products or cloud offerings need to provide a proof of value. It has to be tangible for the business, how does the investment in cloud provide better results for the business.
6. Orchestration: Since there is movement to cloud infrastructure taking place at different pace, there needs to be better orchestration across multiple cloud installations. This can lead to better abstraction, this is the challenge lot of companies are facing today.
7. Data is an Asset: Data can be monetized in the following ways: Generating value for the Business, reducing costs.
8. Support: Snowflake provides good support tools, Cost Effective. Some of the customers explained how the uptime of snowflake has been very good in spite of the huge data loads coming into the system.
9. Data: What type of information needs to be sent out/provisioned. One of the guest mentioned there are two important aspects with respect to data: 1. Information that a person needs to know, 2, How the information will affect you.

Overall lot of information in a single day event, i am sure each one of the aspects mentioned above can lead to deeper discussions and/or projects. The event provided a overall perspective of where things are headed in the data space and how companies are planning their work in the coming years.

Monday, November 9, 2020

Product Manager - Thoughts/Observations

 One of the profession/roles that is talked about, discussed, in demand is the role of a product manager especially in the technology world. There is lot of enquiries or need for Product Managers, also the recent COVID crisis has challenged the business and hence lot of Product Managers also lost Jobs. A interesting trend i have noticed is that there are product managers who have a couple of years experience to folks who have 10 Years or more of work experience. It is a very broad spectrum and hence lot of questions are raised around who can be a good product Manager. Also noticed that there folks who want to become a Product Manager are the ones who do want to code in certain areas of business. Let me try to take a deeper look and pen down my Observations. In Some cases Product Manager roles have become glamorous in the sense that it feels nice to say one is a PM.

Product Manager in my discussions with colleagues/professionals is a very important and crucial role in a organization. The role is is at the intersection of the following:

1. Business
2. Technology
3. Customers

So a person who is approaching a PM Role needs to understand the dynamics of the above 3 components. There needs to be a understand of how the 3 components work together. What is the primary business of a n organization what products do they have for customers. Secondly what type of technology is used to build the products, thirdly who are your customers. In Summary one needs to understand high level picture plus also understand the details behind what is being delivered.

Let us dwell a little deeper into each of the components:
  • Business - Understand the business strategy for the company and the Line of Business. Get a gauge on the stakeholders, understand the budget/resources that could be available for you product in terms of development/research/maintenance,What are the interfacing units and dependencies. How is the company performing financially and the target markets.
  • Technology: Understanding The tools being used to develop the product, what type of vendor lock is there , is it based on a Open source Architecture with less of vendor Lock in. In terms of data, what are the data sources, are the data sources very disparate or well integrated. Are there opportunities to streamline the data. One important aspect that is being experimented today is if Product management can be totally driven, if decisions could be justified by data.This is going to be even important in a data/information filled world.
  • Customers: Getting continuous feedback from customers, conducting surveys/talking to customers to get feedback on product usage/issues faced. Conducting useability studies and getting them back into the product backlog. Adopting a agile approach to building a product/collaborating with them to get the proper engagement.
In Summary, Product Manager is a exciting but a challenging role, it is imperative that one has the proper grooming/mentoring to get to a PM Role. There is a lot of temptation to cut corners(Like i won't do certain things...) to achieve it, but the consequences can be devastating that could erode self confidence. It would best to have a plan of action, set of goals and work with a mentor to achieve the results.

Sunday, October 25, 2020

Unlocking Insights-Data Quality

 One of the main buzzword that we constantly hear is about insights or unlocking insights from data. This has been one of the main selling aspect when it comes to selling technology to the business. The sophistication of tools is a welcome feature to unearth the insights, at the same time what are the critical components in order to get meaningful insights? One of the fundamental requirement in order to get meaningful insights, is to what have a solid data pipeline end to end. In order to have this there needs to be the following in place:

1. Data Governance/Lineage.
2. Metadata Quality/Entities.
3. Valid Domain Value Chain.
4.  Customer Data (Profile/Accounts/Interactions via different Channels)
5. Data Quality including Workflow/Validation/Data Test Beds/Deployment.
6. Track Data Assets Related to a domain.
7. Business Data Owner - A Person or a group of people who can help identifying Business       purpose/meaning of all the data points in the Domain.
8. Ability to Handle Technical Debt - How to Systematically handle technical debt. A very common scenario in organizations grown by Merger and Acquisitions.
9. Scale, Share and Speed - Does the Architecture, Infrastructure available can handle the frequency/speed of data requests by business.

The elements mentioned above are very important, a good interplay of the above elements are needed in order to generate valid insights. For insights there are 2 main components
1. Insight Rules - Rules which are executed when certain events happen and certain business conditions are met.
2. Insight Triggers - Capture data points when certain events happen, for example there was a credit card transaction made at lowes or Home Depot, or someone paid a SAT entrance exam fee or a there was mobile deposit made. As part of this process there is also selection criteria around how are the transactions picked, also includes whether the insights are going to be triggered daily, weekly or a monthly basis.

The combination of the above 2 components can help generate insights, now assuming that the 8 elements mentioned above are satisfied or they are in place. It would be also advisable to categorize the Insights based on the Domain so that it would be easier to track and maintain the insights. There is constant mining of data that is being done, in order to generate accurate insights.
AI and ML are used very heavily when generating insights. The effectiveness of AI, ML becomes more apparent if the underlying data infrastructure is really solid.
The purpose of this blog post is to explain highlight the importance of solid data foundations needed to generate valuable insights for business and customers.

Wednesday, October 21, 2020

AI in Mortgage

 AI has been permeating different aspects of life, Business and Technology, there are more sophisticated implementations of AI seeing the light of the day. There have been gains made with AI in terms of value added proposition with different types of Business. One of the areas where there has been lot of discussions and debates about the use of AI has been in the field of  Mortgage. There have been lot of automated tools, chatbots, Quickens Rocket Mortgage and companies have been trying to implement their own versions of digital experience in the Mortgage Space.One of the challenges in Mortgage is that the processes still are complex, there are still traditional methods that are being adopted and a lot of dependencies given the wide nature of information that is needed for Mortgage. There are 3 components that need to come together in order to implement AI in Mortgage, they are People, Process and Technology. In Mortgage processes, when you apply for a loan or refinance a loan, usually there lot of documents that are needed. The processes for handling these have been sluggish to pretty decent, it does take a quite bit of time. Apps Like Rocket Mortgage and other bank offerings do seem to alleviate some of the pain points with respect to this process. The other aspect that been utilized to improve process efficiency is to move to the cloud platforms hopefully to streamline the data available from different data sources.

There are couple of ways to handle AI methods in Mortgage Space, one is to develop inhouse methods to use AI and ML techniques to automate mortgage process. The other option is to use any API available in a API marketplace and enhance the process. Given the recent developments in AI, Google has come up with a API called Lending DocAI,is meant to help mortgage companies speed up the process of evaluating a borrower’s income and asset documents, using specialized machine learning models to automate routine document reviews, it is mentioned here: More details on the API is mentioned here: It is good to see companies like google are coming up with industry specific API offerings which can help improve efficiencies. Expecting to see more on the same lines from other tech companies to solve business problems.

Friday, October 16, 2020

Workflow, Data Masking - Data Ops

 Dataops is becoming more prevalent in today's data driven projects, due to the speed at which these projects need to be executed and also be meaningful at the same time. There are tools in the Dataops space that are provide lot of different features, companies like Atlan, Zaloni are very popular in this space, in fact Atlan was named in the Gartner 2020 Data Ops Vendors list. Now coming to the different features needed in these tools, there are concepts that are becoming very important, those are Data Masking and Workflows. It is very well know that in Data Driven Projects testing with valid subsets of data becomes very important. One of the biggest challenges faced today in Data Projects is the availability of test data at the right time in order to test functionality, usually it takes a process to get test beds ready.

With Dataops tools, one of the features that is promised is Data Masking/Obfuscation which means the production data could be obfuscated and be made available quickly for testing. In the data masking process there is this concept of identifying data elements that are categorized as NPI or Confidential and obfuscating those elements. Dataops tools provide mechanism where masking can be done very quickly, this really helps the process of testing in test environments. The impacts become more visible when one is working on major projects where testing has to be done through multiple cycles and also if one is in a agile environment. One of the leading Data Analytics expert Sol Rashidi mentions about 3 S's - Speed, Scale and Shareability, these are what is expected from Data projects apart from providing Business Value. In order to Satisfy the above requirements, Data masking being made available in data Ops tools is very welcoming indeed.

The other concept i wanted to discuss here is the concept of Workflows in Dataops. When we look at the data flow in general, there are source systems, data is collected into a HUB/Datawarehouse and then data is provisioned out to different applications/consumers. In order to achieve this typically lot of time is spent in developing ETL flows, moving data into different databases and curate the data to be provisioned. This involves a lot of time, cost and infrastructure. In order to alleviate these challenges, Dataops tools today introduce a concept called Workflows.  The main concept here is to automate the flow of data from source to target, in addition to that also execute data quality rules, profile the data and prepare the data for consumption to various systems. Workflows do emphasize the importance of data quality checks which are much more than data validations, these can be customized to verify the type of data that need be to be present with each data attribute. When performing data quality checks in the workflow, the tools also provide the ability to set up Custom DQ Rule and provides Alerts which can be sent to teams who provide the data. There are a couple of vendors who offer the Workflow functionality, they are Zaloni Arena Product and Atlan has it in their Trail offering, hope to be in production soon. Working with quality is fundamental for any Data project, building a good framework with dataops tools provides the necessary Governance and Guardrails. Such concepts will go a long way in setting up quality data platforms which are very essential for AI and Machine Learning Initiatives.

Vendor Links:

Tuesday, October 13, 2020

Data Driven Culture/Product Management

 There are 2 topics i see discussed heavily today in my connections/network or summits/round tables, they are about implementing a data driven culture, how to generate valuable insights using the data, applying AI, Machine Learning. The other aspect being Product Management, there are lot of sessions/talks about this topic, also lot of people wanting to becoming Product Managers. In a sense it seems like Data Analytics, Data Scientists and Product Manager are very glamorous titles to have. They are very responsible positions and care needs to be taken to make sure that one develops the needed skills for the above jobs. I would like to dwell a little further into these positions.

Data Driven Culture is more easily said than done, it requires a combination top/down and a bottom up approach as well. There has to be a complete embracement of the ideology by the leadership/business and technology. Everyone needs to have the understanding of what needs to be done with the data, the end state of data projects and most importantly the willingness to collaborate. Such a culture would enable better architecting of the infrastructure, good data governance/management, ability to choose the right infrastructure and platform. The focus needs to be on the value add rather just simple cost cutting, there are going to be times where certain transitions could cost money but for a eventual payoff later.This also brings up the point. of ability to using AI rin a responsible manner.

Since there is a lot of emphasis on data, it also feeds into the aspect of Product Management. Data can be used very effectively to build products, get feedback on products. Data can be a strong asset to improve customer experience and also provided value add behind the features. The type of data being represented in the product or being used to build products indicates the importance of data. Data can help with quantifiable measures, which can help in gauging how well the product is doing. There are different ways of getting feedback like user surveys, hackathons combined with interviews which can be very useful for Product Management, Having/being aware of such techniques help in grooming oneself about product management. It is a very important role which is at the intersection of Business/Customers and Stakeholders.

Product Management and Data Ops/Data Driven Culture will increasingly co-exists in the future, so focus on deriving valuable insights from data and the data culture is built to facilitate such initiatives.

Monday, October 5, 2020

Dataops - What is Data Ops...

 We live in a world of metaphors, there are new terms and metaphors which are heard everyday, with that it causes a lot of confusion, pressure and also some amount of chaos. It is important to filter out the noise and focus on what are needs of the business, customers/stakeholders. There are continuous attempts to streamline data projects, the reason being there is lot of unwanted costs, project delays and failed implementations. The whole purpose of data projects should be focused on value add for business or improving customer experiences and better integration of systems. In the Agile world, we have heard of Devops as a way to provide Continuous integration and Continuous deployment, similarly there emerged DataOps. What is DataOps: 

As defined by Dataops manifesto:
Through firsthand experience working with data across organizations, tools, and industries we have uncovered a better way to develop and deliver analytics that we call DataOps. 
Very similar to agile manifesto, there are principles involved around DataOps. In order to facilitate Dataops there are tools available in the market today that try to tackle different aspects of DataOps. Some of the major areas in Dataops includes:

Data Quality - Very important, ability to perform simple to complex data quality checks at the time of ingestion of data. Data quality need to implemented as part of workflows where in the data engineer can track the records that were imported successfully and remediate records that failed.

Workflows - Ability to track data from sourcing to provisioning including the ability to profile, apply DQ Checks. Workflows need to be persisted.

Data Lineage - Ability to track how data points are connected from source systems all the way to provisioning systems.

Metadata Management - Categorizing all the different business, logical entities within a value chain and also have the ability to have a horizontal vision across the enterprise.

Data Insights - Based on the 3 aspects mentioned above, ability to generate valuable insights and provide business value for customers/stakeholders.

Self Service - Dataops also relies on building platforms where in different types of personas/users are able to handle their requests in a efficient manner.

Handle the 3 D's: They are Technical Debt, Data Debt and Brain Debt. I would like to thank Data Engineer/Cloud Consultant Bobby Allen for sharing this concept with me. Extremely important to handle this while taking up data projects.

Ability to build and dispose Environments - Data Projects rely heavily on data, the ability to build environments for data projects and quickly dismantle them for newer projects is the key.

It is very important to implement DataOps in terms of what is the value add for the business and how data will improve Customer Experience.

There tools that implement Dataops, some of the tools already in the market are: Atlan, Amazon Aethana.