Monday, January 11, 2021

Data Changes/Data Quality-AI/ML

 With AI/ML getting increased adoption in organizations, Data is fast becoming the most sought after asset. The flow of data is becoming more continuous, just like opening a tap and water flows.One of the trends i have noticed in data projects is that once the desired goal is achieved, it seems like there is a tendency to think that it is done. This is where the problem starts, data is not static, it is subject to changes or continually changes. When changes in data happens, it can affect the underlying structure or grain of the tables which are used in AI/ML Projects. If these changes are not determined or tracked upfront, it can cause the AI/ML Models generate inaccurate results. So the AI/ML models which were very efficient three months ago , now are not accurate anymore.

Tracking data changes either in terms of format coming from a vendor or schema changes in the source is very time consuming and inefficient/manual. There is very good opportunity to identify this as a problem area and look for if this gap can be addressed by providing automated solutions. With infrastructure and related components we have lots of utilities/dashboards that indicate if certain parameters are reaching a limit nd adjustments are needed. Using the same analogy we would need to have tools that can tell the data teams/business users any changes happening in data. This would help the data teams spend more time on tuning/optimizing the data models and not spend too much time on manual/inefficient tasks. Handling data changes is very critical for product management, with increasing reliance of product management on data. This is a little more of a abstract post but these are high level patterns that i have seen in the data related projects.