Thursday, February 23, 2012

Powerview-SQL Server 2012

One of the areas which has seen lot of growth and innovation is the area of dashboards. Data visualization is a space which has seen lot of vendors offering very innovative and impressive products. Microsoft reporting with SSRS is good and improved with report builder 3.0 but it needs to broaden its offering to compete with other vendors like Microstrategy (which has now started offering BI on the cloud),Tablaeu. In coming version of SQL Server 2012, microsoft has introduced a full fledged adhoc reporting tool, it is an end-user focused tool that provides easy to use ad-hoc reporting and data exploration for users ranging from analysts to information workers.
Here is a link to a tutorial on Powerview, it also lists the prerequisites for installing Powerview.
http://social.technet.microsoft.com/wiki/contents/articles/6175.aspx

One can also view sample reports created in Powerview by going to this link:
http://blogs.msdn.com/b/microsoft_business_intelligence1/archive/2012/02/07/power-view-overview.aspx

Data visualization is an area which is growing and gaining popularity with information workers.
Sample PowerView Report:



Friday, February 17, 2012

SSIS-Data Flow

One of the key aspects of a ETL tool is the flow of data from the source to a destination. As the data volume increases it becomes more important to tune the data flow components. There are several strategies to tune the data flow performance within SSIS. Here are a few tips listed here. As per Microsoft/MSDN suggestion

when one has OLE DB source where data is being pulled from:
OLE DB Source

When you use an OLE DB source to retrieve data from a view, select "SQL command" as the data access mode and enter a SELECT statement. Accessing data by using a SELECT statement performs better than selecting "Table or view" as the data access mode.

The above suggestion is interesting since it is very common to use the Table/View access mode while pulling data from source especially if it is single table.

For destinations, to figure out if the slowness of data being saved is caused by destination:

Test the Performance of Destinations


You may find that saving data to destinations takes more time than expected. To identify whether the slowness is caused by the inability of the destination to process data quickly enough, you can temporarily replace the destination with a Row Count transformation. If the throughput improves significantly, it is likely that the destination that is loading the data is causing the slowdown.

SSIS Progress Tab
To enable or disable the display of messages on the Progress tab, toggle the Debug Progress Reporting option on the SSIS menu. Disabling progress reporting can help improve performance while running a complex package in BI Development Studio.


Drop/Create Indexes:
One of the strategies employed during SSIS data load is to drop and rebuild indexes. First it is important to figure out the amount of data that would be transferred. Once this is determined an assessment needs to be done as to whether it is important to drop and re-create indexes, since recreating indexes can take time in case the data in the table is large. In certain situations it would be wise to have a rebuild of indexes on tables outside the SSIS data load packages, built it out as part of a DB maintenance routine.

Setting DB Options:
It would be wise to avoid setting database level options from within a SSIS package aimed at pure ETL. For example running DBCC SHRINKFILE from with in an SSIS ETL package, also trying to BACKUP LOGS from with a SSIS package, all of these must be avoided. These kind of DB options need to be handled Separately by A DBA who can put together a Database Maintenance Plan.

I have listed some suggestions and tips which could be helpful while designing Data Flows within SSIS.










Monday, February 13, 2012

SSIS-SQL Server Configuration

During a training session of SSIS, the topic being covered was configurations and the different types associated with it.One of the commonly used configuration types is SQL Server Configurtion. During the Demo, the SQL Server connection was chosen (Which had the SSISConfigurationsData Table) and the appropriate filter. Further steps were performed and finally clicked the finish button, there was an error like
"Could not complete wizard actions, Cannot insert configuration information into the Configuration table".  It was kind of unusual to get this error because normally while performing SQL Server configurations I have not run into this error. I started to research the issue and examined the Structure of the SSISConfigurationsData table. The configured values (all the connection strings, variable values) get into the column called ConfiguredValue, the column length was at 255 chars max. In order to address the issue there are two options.

1. Change the column length to more than 255, for example say 1000.
2. One of the properties that could be configured is the connection string of the connection manager within a SSIS package. The connection manager has a property called Application Name. (Andy Leonard SSIS expert has detailed this property in this link: http://sqlblog.com/blogs/andy_leonard/archive/2012/02/09/ssis-snack-name-those-connections.aspx). In case the Application Name default is chosen/set, the connection string could end up being pretty long exceeding the 255 character limit on the configuredvalue column. Care needs to be taken while setting the Application Name property in the Connection Manager.








Friday, February 10, 2012

SSIS-SQL Server 2012

As SSIS continues to be used and grow in different business areas for the purpose of ETL, there are lots of improvements made in SSIS-SQL Server 2012. The improvements have been made in different areas, one of them is called a feature called Data Tap. In SSIS when working on Data flows, in order to get an idea of what data is flowing through the process, we use Data Viewer. One of the issues with the Data Viewer is that if the data had to saved off to a file location it was not possible. When doing data analysis/SSIS debugging it would be nice to save of the data coming through the process. In order to address this GAP a feature called Data Tap has been introduced. Using T-SQL one can programatically add atap to any data flow within the SSIS package. This feature would allow one to trap issues occurring in production and perform more effective troubleshooting.
BI Expert Rafael Salas has explained this in more detail in his blog here:
http://www.rafael-salas.com/2012/01/ssis-2012-quick-peek-to-data-taps.html
This is a very neat feature which would be used for effective troubleshooting of SSIS packages. The nice thing is that the DBA can run scripts and add data taps.

Monday, February 6, 2012

SSIS-Adding Packages...

When working on SSIS projects using BIDS, one would run into situations where there is a need to add multiple SSIS packages to the solution. Using the feature available within BIDS there can be only one package added at a time. I recently ready this blog post which allows to import more than one SSIS package into BIDS.
http://weblogs.sqlteam.com/lucaz/archive/2012/02/02/how-to-import-in-bids-more-than-one-ssis-package.aspx
This is a hidden feature, thanks to Matt Masson for providing this information.

Thursday, February 2, 2012

SSIS-Derived Column

The Data Flow task within SSIS has several task components, some of the components can be very useful while performing ETL operations. One of the common task that is done would be is audit the source data that has been loaded in terms of the number of rows at source and the number of rows that actually got loaded in destination. The Derived Column task in the Data Flow is one of the very useful components which enabled one to add more meaningful information to the destination database on top of what is already being imported from the Source. The power of the Derived columns lies in that fact it can work on columns which are being provided as inputs to this task, the task can also work on System and User defined variables and columns can be generated based on expressions. Here  I would like to show by an example as to how a column can be generated based on a expression. For example if one needs to have a Status column generated which in turn will be mapped to a column on a Sql Server table, look at the snapshot below.

In the figure above a Column called Status is generated based on the Expression "Success". In the following example Derived columns are also generated based on user defined and system variables:



From the above snapshot several columns have been derived from variables, all of these can be mapped to a table say in sql server database and the data can be loaded.  When the derived column is connected to a OLE DB Destination task, all the derived columns would be available for mapping.