Thursday, September 12, 2013

Data Dictionary

In today's world of data management, business intelligence, data warehouse the amount of data is growing along with the amount of attributes being captured. It becomes very important to have proper data management principles in place. One of the aspect is data dictionary, it is very important to persist the information which would make it easier for users to access. This would enable in finding out what attributes are being made available and where they are sourced from. I would like to explain one of the ways it has been done projects I have been associated with. First thing is there is a template that would capture information regarding source systems in terms ( Database, Table,Server). In addition to this would where the table would end up, sometimes there is a landing area where information first comes in  and then the data would be moved into the user databases. The information pertaining to Landing databases, tables, server need to be captured. At this stage one would identify key columns, if the attributes are NPI/Non NPI, if the attributes are Nullable/Non Nullable. Once this is done in the template we would capture the data type/scale information. One of the important in the template would be how each table would be loaded. A table can be loaded in full meaning it is refreshed everyday or we get changed information , only record that changed. Once all of this information is captured in a template (in excel), the information can be validated using a set of rules (database standards). One of the ways which this can be done is through macros in excel (assuming the template is maintained in this format), any errors would be displayed across each entry in the DD.  Once the errors are fixed, the macros can be used to generate SQL Statements. These SQL Statements can be used to persist meta data information in the database. I have found the concept of data dictionary and persisting them in a database for Data Integration Projects.

Key Information:
Source Server/Database/Tables
Landing Server/Database/Tables
User Database/Server/Tables
Column data type/scale/size information
Key Colums/NPI/Non NPI columns
Type of Data Load.

No comments:

Post a Comment