I had discussed different types of dimensions in my earlier posts. One of the aspects which is involved in ETL/Data warehouse building is the use of Surrogate keys. When data is pulled from source systems there are natural keys which are available. In a data warehouse building environment surrogate keys are used in addition to natural keys from the source systems. The reason for using surrogate keys in data warehousing is because data can originate from different systems, the use of surrogate keys enables one to maintain consistency in the data warehouse and these keys can be used to join dimension tables and fact tables. One of the common ways to generate surrogate keys is the use of IDENTITY columns. This would work as long as the amount of data is limited, the moment the data starts to scale and there could be issues with using the IDENTITY column approach. There are vendors who offer SSIS components which take care of surrogate key management. Pragmaticworks offers a product called Task factory, one of the components available is called Dimension Merge SCD Transform (v2.0), this component has in built Surrogate key management. Here is a link for the product: http://pragmaticworks.com/Products/Business-Intelligence/TaskFactory/Features.aspx. One could come up with Custom solutions based on the business needs.
Post a Comment