In today's business world, there is a tremendous growth of data, there is lot of talk and use of Big data methodologies and tools. In order for data to be meaningful and effective it is very important to have data quality and data management standards. There is lot of data flowing through an organisation, how can be i maintained, how are the security concerns going to be handled, what is the value added proposition once can get from the data, these are the questions that need to be answered. One of the first things to do is create a data dictionary which would be a repository of all the sourcing that is done with all the object and attribute level information. The data dictionary could be part of a MetaData database, where in the datadictionary tables can be maintained. The datadictionary would contain information regarding the following areas:
Source of Data
Domain of the Data
Subject/Area of the data
Type of Source: Text Files/Excel/Sharepoint/Database
Frequency of Feeds
Capturing the above areas in data dictionary would help in maintaining what is being made available in the system. It would prevent data duplication and also give a sense of capacity and what type of data elements are being used. One of the key aspects that need to be paid attention to is how much of storage is being used and would help plan the capacity of data provisioning systems. In addition to maintaining data dictionary is application of data quality standards. What type of data quality is being implemented this could be from the basic checking of columns to complex business rules to determine if the data coming in good. It would be good to have a group which takes responsibility of data governance and quality. It becomes extremely crucial to have such standards in the growing world of data.