Tuesday, December 10, 2013

Data Integration-Data Readiness

Currently I have been working on lot of Data Integration projects, this involves lot of sourcing of data from various system. While sourcing data one of the key aspects to find out if data is ready at source. There are also situations where some other systems could be using your system as a source. In both of the situations it is very important to come up with a Data Readiness strategy. It is extremely important to have a good sourcing strategy since there are SLA's that are to be met and the Business needs data more than ever before to make good decisions. What is involved in Data Readiness? The source systems need to come up with a plan to have data readiness indicators once all the objects/tables have been updated. In my projects the Data Readiness is table driven, normally there is a table which would have the following attributes:
DatabaseName
Date
Schema
TableName/Object
CompletedFlag
LoadStartDate
LoadEnDate
UpdateDateTime
SLATime
When the source tables are updated with data and all the data related operations are complete, there would be a process to update the DataReadiness table which update all the above attributes and set the CompleteFlag Indicator = 'Y'. The Systems which would need data from the source would keep polling the DataReadiness table to check the CompletedStatus and also the UpdateDatetime. The reason to check the UpdateDatetime column is to make sure that the system that is pulling data from the source is trying to get the most recent update on the source system. The Data Readiness layer allows a level of abstraction in the sense that systems requiring data need not check individual source tables. The Data Readiness table can be used to indicate when the jobs need to be run to pull the actual data. In case Target systems are lagging behind with respect to the data update, the Data Readiness layer can be used to catch up with the latest updates on the source systems.

No comments:

Post a Comment