Today in many organizations especially where data projects are being rolled out or data is being sold as product , one of the important aspects to be considered is the data science life cycle. This becomes very important when one is trying to use insights to drive the business forward. This could be in the following areas, I just picked a sample.
1. Revenue Generation
2. Customer Experience
3. Customer attrition
4. Increase Engagement
Here are some of the key Steps in a DS Life cycle:
1. Identification of Data Sources. Data (Structured, Unstructured)
2. Data Extraction
2. Data Extraction
3. Data Storage (Persistence of Data), what type of Data storage - On Prem, Public Cloud, Private Cloud or Hybrid Cloud.
4. Data Engineering - Data Profiling,Data Quality Checks, Cleansing of Data, Data Transformation
Handling of bad data from Source.
5. Data Science Layer - This includes the following:
a. Data Transformation - Addition of Categorical Variables
b. Feature Engineering - Enhance the data set to make it more meaningful - requires domain expertise.
c. Model Training, Testing and Validation
d. Deployment of Models to Production
e. Documentation of Models
f. All of the above Steps could be iterative s new data keeps coming in
6. Data Delivery - How data is going to be sent to end users, stakeholders
a. Method of Data Delivery - Web Services, Visualizations, Reports
b. Actionable insights - There needs to be action taken on the insights provided. Today in organizations there is an issue of these insights not even being looked at. How does an organization go about this is very important.
Data Lineage/Governance - This step would cover all aspects of the steps mentioned here. This includes Data catalog, Metadata repository, Data Value Chains, Capturing important data assets. How data is being represented cross the enterprise.