There is a lot of focus on Artificial Intelligence, Machine Learning in various organizations and they want to get better insights and value out of such efforts. There are significant investments made in terms of time, money to get AI projects started and delivered. One of the key aspects to kept in mind , the success of such projects depends on sourcing the right date, have the right data governance and make sure such efforts align with the business goals. Given such a environment it is imperative that there is a general framework in place as to how AI Projects are executed, so that there are truly value added deliverables being achieved. In a Data world we have the following broad roles trying to make a AI Project a success.
Possible Personas in a data project.
1. Data Engineers
2. Data Analysts
3. Data Scientists
4. AI/ML Developers, Model Creators
5. End users/Reviewers of Model for Compliance and regulations
Each of the above personas are going to be involved in different stages of a data project and would be utilizing different tools to achieve the end goal. When we talk about different stages we could identify that the following steps would be there in any data project, they are:
1. Sourcing data - Sample Tools IBM DataStage, Informatica, Structured Database Oracle, SQL Server, Teradata, Unstructured Data: Infobase
2. Organize data/Metadata Management/Lineage Analysis/Cataloging - Trifacta/Alteryx for Data Wrangling/Prep, Data Catalog/Governance Atlan/Collibra
3. Build Model/Experiments/Analysis - SAS, JUPYTER (Python), R, H2O
4. Quality Check, Deployments of ML Models - Model Frameworks Such as H20, Python, R
5. Consumption of Data By end Users - Tableau, Cognos, Microstrategy
All of the above would require storage and consumption component, these could be Hadoop/Spark, AWS/SnowFlake, Azure or Google Cloud.
When you organize the different tool sets and personas it will provide an overview of what is available and how AI projects could be structured to deliver effective value. For example if there is a lot of need to consume unstructured data and derive data from it in a financial institution for example, once could look at leveraging the Instabase Platform : Quoting from the site:
The biggest opportunity for efficiency within a business is stifled by document processes that only humans can do. With Instabase, you can build workflows to access information in complex documents and automate business processes.https://about.instabase.com/.
Providing a platform where all of these tools can be used so that different personas can have access and move data from one point to another would be of great help for any data project. This would allow consistency of operation , help in tracing data, provide better model validation, make sure any audit/compliance requirements are met.
Successful data projects that include AI/ML have the above ingredients in the right mix, well tracked/cataloged and also take care of changes in data over time plus closely align with the business objectives.
Happy Holidays and a Very Happy, pandemic free New Year 2021, Stay safe everyone.