Tuesday, December 15, 2020

AI/ML Framework

There is a lot of focus on Artificial Intelligence, Machine Learning in various organizations and they want to get better insights and value out of such efforts. There are significant investments made in terms of time, money to get AI projects started and delivered. One of the key aspects to kept in mind , the success of such projects depends on sourcing the right date, have the right data governance and make sure such efforts align with the business goals. Given such a environment it is imperative that there is a general framework in place as to how AI Projects are executed, so that there are truly value added deliverables being achieved. In a Data world we have the following broad roles trying to make a AI Project a success.

Possible Personas in a data project.

1. Data Engineers
2. Data Analysts
3. Data Scientists
4. AI/ML Developers, Model Creators
5. End users/Reviewers of Model for Compliance and regulations

Each of the above personas are going to be involved in different stages of a data project and would be utilizing different tools to achieve the end goal. When we talk about different stages we could identify that the following steps would be there in any data project, they are:

1. Sourcing data - Sample Tools IBM DataStage, Informatica, Structured Database Oracle, SQL Server, Teradata, Unstructured Data: Infobase
2. Organize data/Metadata Management/Lineage Analysis/Cataloging Trifacta/Alteryx for Data Wrangling/Prep, Data Catalog/Governance Atlan/Collibra
3. Build Model/Experiments/Analysis - SAS, JUPYTER (Python), R, H2O
4. Quality Check, Deployments of ML Models - Model Frameworks Such as H20, Python, R
5. Consumption of Data By end Users - Tableau, Cognos, Microstrategy

All of the above would require  storage and consumption component, these could be Hadoop/Spark, AWS/SnowFlake, Azure or Google Cloud. 

When you organize the different tool sets and personas it will provide an overview of what is available and how AI projects could be structured to deliver effective value. For example if there is a lot of need to consume unstructured data and derive data from it in a financial institution for example, once could look at leveraging the Instabase Platform : Quoting from the site: The biggest opportunity for efficiency within a business is stifled by document processes that only humans can do. With Instabase, you can build workflows to access information in complex documents and automate business processes.https://about.instabase.com/.  
Providing a platform where all of these tools can be used so that different personas can have access and move data from one point to another would be of great help for any data project. This would allow consistency of operation , help in tracing data, provide better model validation, make sure any audit/compliance requirements are met.
Successful data projects that include AI/ML have the above ingredients in the right mix, well tracked/cataloged and also take care of changes in data over time plus closely align with the business objectives.
Happy Holidays and  a Very Happy, pandemic free New Year 2021, Stay safe everyone.

Monday, December 7, 2020

Snowflake - UI Components

Snowflake is fast becoming a very important component of the cloud migration strategies in the business/technology world today. I have been in discussion with different leaders, also in participating in different conferences there is a lot of interest in Snowflake. One good thing is that there is a trial period for snowflake which one can sign up for. You can go to the Snowflake web site and sign up for the trial. The sign up process for the trial is very straightforward and once you have it set up, you are provided with an option to go through the introductory material related to snowflake and there are some very good documentation related to different areas in Snowflake. Once you sign into snowflake you should see the following interface. There are Different components that are listed in the interface, lot of it looks very similar to the sql server management studio layout especially the object explorer. The main components in the UI are:

1. Database - Lists the Databases in Snowflake Instance.
2. Shares - This is related to Data Sharing within your Organization.
3. Data Marketplace - This option allows one to look at the Snowflake Data Marketplace, which lists the public data sources that are available to different categories like Government, Financial, Health, Sports
4. Warehouses - Lists the warehouses available on the snowflake instance.
5. Worksheets - The option where one can write SQL Queries like below. There is a sample database called DEMO DB which has list of tables that can be used for querying. Also these queries can be used for building out Dashboards that is available in the Data Market place option.

One can also load data into snowflake based on the different data loading strategies that are available which has been discussed in earlier blog post written by me: https://www.blogger.com/blog/post/edit/2437651727370625818/3834832172078982453