Friday, February 17, 2012

SSIS-Data Flow

One of the key aspects of a ETL tool is the flow of data from the source to a destination. As the data volume increases it becomes more important to tune the data flow components. There are several strategies to tune the data flow performance within SSIS. Here are a few tips listed here. As per Microsoft/MSDN suggestion

when one has OLE DB source where data is being pulled from:
OLE DB Source

When you use an OLE DB source to retrieve data from a view, select "SQL command" as the data access mode and enter a SELECT statement. Accessing data by using a SELECT statement performs better than selecting "Table or view" as the data access mode.

The above suggestion is interesting since it is very common to use the Table/View access mode while pulling data from source especially if it is single table.

For destinations, to figure out if the slowness of data being saved is caused by destination:

Test the Performance of Destinations


You may find that saving data to destinations takes more time than expected. To identify whether the slowness is caused by the inability of the destination to process data quickly enough, you can temporarily replace the destination with a Row Count transformation. If the throughput improves significantly, it is likely that the destination that is loading the data is causing the slowdown.

SSIS Progress Tab
To enable or disable the display of messages on the Progress tab, toggle the Debug Progress Reporting option on the SSIS menu. Disabling progress reporting can help improve performance while running a complex package in BI Development Studio.


Drop/Create Indexes:
One of the strategies employed during SSIS data load is to drop and rebuild indexes. First it is important to figure out the amount of data that would be transferred. Once this is determined an assessment needs to be done as to whether it is important to drop and re-create indexes, since recreating indexes can take time in case the data in the table is large. In certain situations it would be wise to have a rebuild of indexes on tables outside the SSIS data load packages, built it out as part of a DB maintenance routine.

Setting DB Options:
It would be wise to avoid setting database level options from within a SSIS package aimed at pure ETL. For example running DBCC SHRINKFILE from with in an SSIS ETL package, also trying to BACKUP LOGS from with a SSIS package, all of these must be avoided. These kind of DB options need to be handled Separately by A DBA who can put together a Database Maintenance Plan.

I have listed some suggestions and tips which could be helpful while designing Data Flows within SSIS.










No comments:

Post a Comment