Monday, December 31, 2012

PowerPivot-SQL Server Data Import...

I would like to take this opportunity to wish all of my blog readers/visitors a very happy new and prosperous new year 2013. In the coming year I am sure there would be a lot of paradigm shifts/new methodologies/big data analytics adoption. I hope it is a good prosperous year for all of us. In this blog post I would like to write about PowerPivot, more importantly how to import data from SQL Server. The PowerPivot Download is available in the following website: In the link mentioned there are also of lot of demonstration videos regarding the use of PowerPivot.
In order to install PowerPivot, one needs to have the Excel 2010 Add On, in case one has Excel 2010 as part of Office 2010 that should be sufficient as well. Once PowerPivot is installed and one opens Excel 2010, there is a PowerPivot Option at the end. When PowerPivot Option is choosen, this needs to be done in order to work with PowerPivot, one can see the screen captured below.

Once the above screen appears, the user needs to click on the PowerPivot Window icon, this will bring up the next screen. In that screen, in the From Database option choose From SQL Server. This will allow one to connect to SQL Server database and choose the tables that needs to be used for PowerPivot reports.

Customer stories:

Wednesday, December 12, 2012

In Memory Analytics...

In the past couple of years there has been a lot of growth in the different type of Analytics that is being performed at various organizations. There has been a tremendous requirement for companies to be very Agile and have a speed to market to introduce their products and services. This has resulted in lot of exciting requirements for the BI Analytics domain to fullfill. In the SQL Server space, Microsoft first introduced inMemory analytics through PowerPivot. Now with SQL Server 2012 we have a new type of Analytics called the Tabular Model which uses InMemory Analytic capabilties. This is different from the SSAS model of analytics. These days the end users/Business users have been provided with tools to harness the power of inMemory Analytics. The couple of in Memory technologies available in SQL Server 2012 are  xVelocity Analytics, xVelocity Column Store and Power Pivot. Quoting the Microsoft web site(,xVelocity is a family of memory-optimized technologies spanning Analytical, Transactional, Streaming and Caching data. Designed for industry standard hardware, xVelocity is a built-in capability in SQL Server 2012, SQL Server Parallel Data Warehouse, Office Excel 2013 and Windows Azure. Please read the following the blog post in the link provided below about a very fast inMemory analytics project from Microsoft, this is still in Private customer Preview.

Tuesday, December 4, 2012

Data Abstraction-Data Integration

In my earlier blog post I discussed about data integration and how it can provide unified view of disparate data sources. In today's world of data warehousing/data analytics it is becoming increasingly common to have disparate data sources and there are lots of ETL projects which are aimed at consolidating the data into data marts and data warehouses. In today's blog post I would like to introduce a new product called Composite Software which helps in Data Integration and Provides Data Abstraction. It helps the business to perform Data Virtualization. This provides the business with following benefits:

•Simplify information access – Bridge business and IT terminology and technology so both can succeed.

•Common business view of the data – Gain agility, efficiency and reuse across applications via an enterprise information model or “Canonical” model.
•More accurate data –Consistently apply data quality and validation rules across all data sources.
•More secure data – Consistently apply data security rules across all data sources and consumers via a unified security framework.
•End-to-end control – Use a data virtualization platform to consistently manage data access and delivery across multiple sources and consumers.
•Business and IT change insulation –Insulate consuming applications from changes in the source and vice versa. Business users and applications developers work with a more stable view of the data. IT can make ongoing changes and relocation of physical data sources without impacting information users.

Please use the following link to know more about the software:

The developers can work with the Composite Studio to develop the views which would be sourced from different data sources. In a nutshell Composite software is a good tool for perform Data integration and Data Abstraction.

Wednesday, November 28, 2012

SQL vs NoSQL-Discussion...

These days there is a lot of implementations which have started in the Big Data Space/Not Only SQL Space. There is a lot of debate about which is better and how one would replace the other. One of the more interesting discussions that I have been hearing is that both SQL/Not only SQL (NoSQL) have their space. A system that combines the advantages of both the paradigms stand to gain the most I think. Not only SQL can be used a medium to get sense of the vast amounts of unstructured data, once it is in a consumable form it could be integrated with other SQL Systems. There is a interesting blog post about cons of NoSQL (Not Only SQL), of course what is also interesting is the comments made by NoSQL users below the blog post. It is a very interesting discussion.
One tends to benefit from understanding what is each systems strengths and weakness and how both can complement each other.

Friday, November 16, 2012

Data Integration & Virtualization...

In today's BI world there is data pulled from variety of Data sources. Once the data is pulled from the source they need to housed is what is called a Operational Data Store. Once data is housed here , then data is transformed/modelled for different reporting and analytical purposes. One of the key points to be considered is the cost of storage and the support personnel needed to maintain these data stores. The ETL's to perform such pulls can also be very complicated in certain cases. In the recent times the concept of Data Integration and federation has evolved and is being experimented in Master data management. One of the frequent requirement today in the business world today is that there is a need for a unified view of disparate data sources. Currently this requirement is being handled by an ETL Based approach and building data warehouses on top of them, at the same time tying disparate data sources together can be a challenging process. This is where the concept of data integration architecture kicks in, one can look at the following link to get an overview of data integration and data reingetration methodology:
This is an area to watch out for in the coming years as more and more disparate data sources are being used for data analysis.

Monday, November 5, 2012

SSIS 2012-Deployment of Packages...

In SSIS 2012 there are two types of deployment Project based deployment and Package based Deployment. The package based deployment follows the procedure/process in SSIS 2008. The more preferred approach in SSIS 2012 would be the project based deployment. Let us say for example we have a package we need to deploy in SSIS 2012, In SQL Server Data Tools, open the SSIS project/solution and then right click on the Solution Name and choose Deploy. A Integration Services Deployment wizard opens with the list of steps that need to be completed for a successful deployment of the package. At first select the Integration Services Project that need to be deployed. In the next screen, the user is prompted for 2 pieces of information. The first one is the Server name, this would be SQL Server 2012  where the package is to be deployed. The second one is the path for the SSISDB which is the Integration Services Catalog on the SQL Server 2012.

Here if the SSISDB is not present on the server, one needs to created the SSISDB manually. This can be done in SQL Server Management Studio by right clicking on the Integration Services Catalog. Choose the New Catlog Option, the Create Catalog Window opens. In this window one needs to check the Enable CLR Integration so that the catalog can be used. By default the name of the Catalog database is set to SSISDB. There needs to be password entered for the SSISDB catalog database, the reason being the catalog protects data using encryption. Once this complete the Catalog database is complete.

Now going back to the Deployment wizard, the path to the catalog database needs to be entered, it would be something like:
Once the information is completed on this screen, the choices can be reviewed in the next screen, then once we proceed to the next screen, the project is ready to deployed and one can see the progress. On successful completion of deployment one can see the package being deployed in the SSISDB catalog.

Sunday, October 28, 2012

SQL vs Hadoop, Hive...

Yesterday I had the opportunity to attend the SQL Saturday Event in Charlotte. It was a packed day with lot of topics, all of them focused on the Business Intelligence Space. For me the day started of with a session on Upgrading to SSIS 2012 by Paul Rizza From Microsoft, it was a very informative session with lots of pointers towards what needs to be looked at in SSIS 2012, I am planning to write a more detailed post on this topic later.  One of the important lectures I attended was on Hadoop vs SQL and how Hadoop can be utilised in Data Warehouses. As expected it was a packed session with folks looking forward to what is Hadoop and see what all the hype was about. Hadoop targets the unstructured data space and has mechanism to plow through the huge amounts of unstructured data that is available these days. Companies today are trying to make meaningful dissection of unstructured data and help the strategic direction of the company. The presentation was provided by Rob Kerr (CTO with BlueGranite) and it was full of valuable information. One of the differences bet ween SQL and Hadoop is that SQL follows the Schema Write methodology, meaning first the table structure or the schema is created, then data is inserted into the tables. In Hadoop the Schema Read methodology is followed meaning as the data is loaded the schema is determined at run time which allows it to tackle unstructured data. The mechanism that is followed to decipher unstructured data is that it uses the MapReduce methodology. During the Map() function the data is being assigned to different nodes by a central node, then after that Reduce() function kicks to parse the data, once the data parsing is done the data from the different nodes is sent back to the requester. The other difference between SQL and Hadoop is that SQL requires consistency and structure follows the ACID principle, Hadoop does not require the structure and focuses on data availability, Hadoop framework also processes information in a batch.

It is very much possible that both SQL and Hadoop can coexists in Datawarehouse environment. For example let us say one has a datawarehouse regarding call center data which houses all the structured information, at the same time there is lot of data captured in a call center to chats/online mechanisms. Instead of shedding away the unstructured data, one can use Hadoop to parse through the unstructured data, look for keywords that would matter to the business and store it. One can work on integrating Hadoop with the datawarehouse and get unstructured data to be fed into the Datawarehouse. To get more information about the presentation about Hadoop and SQL please use the following link:
There are other links in the above site which provide more information about Hadoop and SQL.

There a layer on top of Hadoop where in users can use sql to get information about unstructured data that is through a tool called Hive. Hive sits on top of Hadoop layer which can accept sql commands. Once the sql command is entered by the user the query is passed on to the MapReduce mechanism of Hadoop. Once hadoop receives the request and data is parsed out the result then returned back to the Hive Layer where data is presented back. Please use the following link to learn more about Hive:

Here are some examples from the above site:

Creating Hive tables and browsing through them

hive> CREATE TABLE pokes (foo INT, bar STRING);
Creates a table called pokes with two columns, the first being an integer and the other a string

Tools are also getting better with the Hadoop framework which is making it better to get meaningful data out of unstructured information.

To summarise after the session I felt that i gotten better information Hadoop and how the framework can coexists with SQL and utilise the information embedded in unstructured data.

Thursday, October 18, 2012

SQL Server DB Compare Tool...

When working on database/data warehousing projects one gets to work on multiple environments Development,QA,UAT and Production. One of the tasks that is usually done very frequently is that of comparing database objects. In a database/datawarehosuing project the database objects get constantly moved from one environment to another environment and very often there is schema's on the different environments need to be compared to see if there are any differences.I was referred to a tool by a SQL Server blog Contributor/Developer Imran Mohammad. I decided to check the tool out as I am currently doing lot of data sourcing projects where in I have to do lot of database comparisons. The name of the tool is called SQLDBDiff the link for the tool is given below: In the website there are two versions of the tool listed, the freeware and shareware version. The site lists the differences between the two versions, I decided to download the freeware version.
The freeware version is in the form of .zip file and I extracted the files and ran the Compare Databases Tool (the file is called SQLDBDiff). In the Menu Option File one has to choose Compare two databases, this launches a window where one can connect to the source and target databases. Here the source and target databases are the databases where tables and other objects need to be compared. Once these values are filled out there is Options button where in one can check what type of database objects need to be compared.

There are a good set of Preferences ranging from tables to Server Properties. There is a Compare options tab where in one can choose options like whether system objects, statistic indexes need to be included for comparison. Once these are all set click the OK button to start the comparison, once the comparison is complete there is a set of tab based windows which appear with the list of objects which have been compared. In each row of the screen one can double click to see the details of the comparison and there is a status column which tell the user the result of the comparison (whether it was the same (green) or not (red)). The freeware version should be good enough for one to get started, in case there are more detailed features like Generate schema and Data synchronization scripts one can get the Shareware version.

Friday, October 12, 2012

SSRS-Job Monitoring

When one is working in production support handling montoring of jobs, one of the key things that is required is notification of failures/errors. Once notified the next important aspect would be how quickly can the errors be resolved. In today's world of business trying to be lean/agile and execute quickly, turn around times can be very important. Let us for example take a scenario where one is monitoring jobs (which can have mutliple steps ranging from SQL Scripts,SSIS) and there are notifications when there are failures. There are managed environments where only DBA's would have access to job failures in such cases one would have to depend on the DBA's get detailed information about failures, of course building in error logic would help. The developer's would then need to act on the failures and put in the appropriate fixes. One of the ways the trunaround times can be improved is by using SSRS reports that can use the jobs tables present in msdb database. The first SSRS report would be a summary report which would provide summary information of the jobs. The data elements for this report would range from Job Name,Frequency to getting NextRunDateandTime. This report can be generated by using a SELECT statement querying the sysjobs table and sysjobschedule table. One of the important steps that need to be performed during the design of this report is to have a drilldown on the JobName column, the purpose being to get details of the job along with step and failure information. The second report that needs to be built is the Jobs Detail report which would provide step information within the job along with messages associated with the step, this would provide valuable information for the developer to trouble shoot the problem. The main tables that would be involved are:


msdb.dbo.sysjobschedules (this focusses on the next run date and time for each job)
msdb.dbo.sysschedules (this has schedules for all the jobs)

One of the key tables involved here is sysjobhistory, the message column in this table would provide useful information, this table is connect to sysjobs and sysjobsteps based on job id and step id. The sysjobhistory also contains run date,run time and run duration. The date, time and duration would need to be formatted based on user needs. Once the query is built out and attached to the report, the detail report needs to be referenced in the hyperlink for the Job id in the first report. The detail report would take the Job Name as the parameter. The reports can then be deployed to the SSRS server for the developers to use. One of my team members bought up this idea and it has benefitted our dev team immensely.

Friday, October 5, 2012

Datawarehouse-Data Sourcing...

In the BI/Data warehousing Data is of paramount importance and of course how the data has to be modelled and used for reporting is an another important task, since how one uses the data would really benefit the business. Since I mentioned about Data, it is very critical that the right and what needs to be consumed is bought into a Data warehouse. Use of redundant/unwanted data could result in unnecessary usage of space and support which can drive up the cost of doing business. Data Sourcing is an important task in life cycle of a good data warehouse/BI system. Data could be coming in from multiple sources as per requirements of the business. It is very important to have a Data Sourcing strategy since that will enable technology groups to build more meaningful data warehouse systems. I did not realise how important the paradigm of data sourcing is till I started working on a Project. I would like to list different aspects that need to be considered for data sourcing. The factors listed below is of course going to vary on the type of business one is supporting.

What type of data source is the data being sourced from (Databases/Flat Files/CSV/Spreadsheets/Mainframe/NoSQL Data Sources, to name a few...)

What type of mechanism is being used for Data Sourcing: (Is it going to be a Pull Mechanism/Push Mechanism : Depending on the type of mechanism being used , the type of handshake process needs to be clearly established between the source and target systems)

One aspect which is very important is the frequency of the data feeds into the target systems. How well can the target system handle the volume of data coming in. Is there sufficient capacity to handle the load.

It is very important that the users who are going to consume the data be engaged in Data sourcing activities. Since it is for the business eventually the data is sourced. There needs to be very good analysis of requirements which would enable the technology group to determine what attributes need to be fed into the system. This is the stage where lot of data mapping exercises could be potentially performed.

It is important to determine what the needs of users are, is it going to be real time reporting or archival type of reporting. This is going to be closely linked with the frequency of the data coming, the type of database architecture in place. As one can see from this, how all of the different aspects of databases and system architecture is closely related to each other.

Since we are dealing with Data transport, one needs to factor in what happens when there are errors, how does one recover from it. There needs to be focus on Data reconciliation and how source system would support it.

I have listed some of the key points related to data sourcing, there is always room for further discussion this topic which I will do in a another blog post.

Tuesday, October 2, 2012

SQL Server Database Testing...

One of the tasks that is vital once Development is complete as part of a database development project is testing. There are different types of testing on Database Development projects that involve stored procedures, functions,Views and tables. Initially lot of the testing was based on sql scripts which would have been developed for different scenarios. Lot of the database testing which I did was done primarily using SQL Scripts. These days lots of development shops have adopted Agile Framework wherein the Testers and Developers need to be really co-ordinated. Recently there are tools that are being developed for Database Testing which provide lot more options and flexibility. One of the tools that i came across was SQLTest From Red Gate Software. I have used tools developed by Red Gate and have found them very useful and efficient. Here is the link for SQL Test: The main purpose of the SQLTest tool is to develop unit test cases scenario and execute them in a more user friendly manner. As quoted by Red Gate: With SQL Test you can discover defects much earlier in the development cycle and make continuous integration, agile development, and test-driven development achievable goals. SQLTest is powered by tSQLt which is a framework developed by Red Gate. One of the feature which I like is the ability to check the test cases into SQL Source Control which is a another offering from Red Gate. SQL Source Control can be connected to the following:

All others

Please use the link : for further information. As Database projects continue to grow in complexity testing of database development projects have become all the more important so that integrity and accuracy of the data will be maintained.

Friday, September 28, 2012

SQL Saturday in Charlotte...

One of the neat things about the SQL Server community is ability to share knowledge and advice through community events. SQL Saturday is a neat concept where in on a Saturday SQL Server related topics are presented and discussed in a city. This gives the ability for the folks who tight on budget to attend the SQL Saturday events and get good information and also network with othe SQL Server Professionals. BI has grown tremendously over the years and the number of folks working with SQL Server BI is increasing and growing. On Oct 27,2012, sql server saturday BI edition is going to be held at Location: Central Piedmont Community College (CPCC), Levine Campus, 2800 Campus Ridge Road, Matthews, NC 28105 (located in South Charlotte). Please see link for additonal details about the event and the schedule of topics/speakers.
Hope folks who are attending get benefited from the above event.

Monday, September 24, 2012

Import Data into SQL Server From SAS...

In the current business environment, there are lot of data sources being used and constantly data is being pulled into one DataMart/DataWarehouse from another data source. In certain environments I have seen SAS being used very heavily. One of my blog readers had a question about how to import data from SAS into SQL Server. One of things to be taken into consideration is what environment is SAS running currently. SAS runs on large UNIX environments and also PC windows based environments. In case SAS is running on windows, Here are the options available: The following Link talks about Bulk Loading For ODBC:,
Here is a reference to:
SAS/ACCESS Interface to Microsoft SQL Server

The above link mentions about how to export data from SAS to SQL Server. The other approach is to use SAS and export data to a file location in a flat file/.csv format and then use ETL tools to pull data from the file location.

There are cases where we would want to use SSIS to pull data from SAS. First one would need the SAS Providers for OLE DB, please use the link below:

Please refer to the blog by Rafi Asarf: titled loading SAS files with SSIS.

The above link provided detailed information on how to import data from SAS using SSIS. Also read the comments section where in certain error situations have been discussed.

Hope all of these provide enough information on working with data from SAS.

Thursday, September 20, 2012

Database Modelling...ER Studio...

One of the important aspects of database design is to model the database. There are two types of models one is logical model and the other is a physical model. It is important to model the data that needs to be analysed or reported. Modelling provides a abstract view of the tables present in the database, here in modelling terms we talk about entities and relationships. With good modelling one can understand the value of data beneath it and also provides a better view of the business that we are trying to capture in terms of the Data. Most of the times the models (ER) are reverse engineered, rarely do models are generated first and then database is designed. Sometimes one would have a fair idea of the entities that would be present in the database through a model. I recently started working on a data sourcing project and one the requirements was to have a ER model of the data that we are receiving from the source. In order to generate/create the ER model I decided to use the ER/Studio Data Architect Version 8.5.3 from Embarcadero technologies.
The install was pretty straightforward. Once I launched the ER Studio, I realised that there lot of options available in this tool. There are 3 different options for creating a model:
Draw a new data Model: There are 2 options Relational,Dimensional
Reverse Engineer an existing Database: Here one would connect to a database and choose the tables that need to be modeled. This is one of the most used option.
Import Model From: This can be done on ERX file,SQL File and External MetaData.

I choose the option of Reverse Engineer an existing database, this is done by connecting to a sql server database which is set up using the ODBC setup option in the Tools menu in the main screen of the ER Studio.

Once I connected to the database I choose from the list of tables I needed to model and the ER Studio takes you through a couple of screens and finishes up generating the ER Model. In the Screens there are options to choose relationships based on Foreign Keys. Overall there are lot of features that are available in the ER Studio which makes it a full fledged modeling tool.

Monday, September 17, 2012

Database Monitoring...

One of the important tasks of DBA is to monitor Databases and check how the sql servers are performing. As one would expect there are lot of aspects involved in SQL Server Database monitoring. With the advent of bigger applications and systems requiring quick response times, Database performance has increased in importance. When a company has lot of servers a good tool would certainly come in handy. One of the tools recently i came across was SQLMonitor developed by RedGate. I have loved products from RedGate especially the backup database utility they have. The SQLMonitor tool is a web based database monitoring tool. It is avaialble for download and test at the following link:
One of the neat things about the SQL Monitor is that it has got a web based interface. There are several other features offered by this product. The features are Centralized Alerts,StraightForward Views,Real Performance Data,Top 10 expensive queries list, Built in Analysis,User Roles. The below image is taken from the Red Gate Software.
The Real time Performance Data feature is pretty neat it contains lot of useful information about Cluster,CPU,Disks,Network and System Processes. For More Information Please check out the red gate site at

Monday, September 10, 2012

SQL Server 2012-Database Partitioning...

One of the issues that we frequently face in databases is how to query large amounts of data in a table. One of the suggested methods is to use partitioning of tables and indexes. In case of data partitioning the data in the table is partitioned horizontally and all the partitions of the tables need to be reside in the same database.
In SQL Server 2012 the number of partitions supported has increased to 15,000, earlier it was limited to 1000.  One of the benefits of using partitioning is transfer or access parts of data efficiently. One of the scenarios were this would help is when loading OLAP systems from OLTP databases. There are a some concepts which are central to table partitioning. They are:
Partition Function: A database object that defines how rows of data are partitoned based on a partitioning column.
Partition Scheme: This is helps map the Partition of a partition function to a set of filegroups. This helps perfomance of backup operations of filegroups.
Partitioning column: The column or index of a table that is used to partition a table or an index.
One of the common scenarios where partitioning is implemented is the sliding window scenario. Please take a look at the following link (though it is for SQL Server 2005) it gives us a good idea of how to implement the partition in SQL Server:

Let us say if we wanted to create a partition on Sale Date and have data stored for every month, one of the ways to do it is:
(Please note: Partition Function can only be created in Enterprise Edition of SQL Server, only Enterprise Edition Supports Partitoning).

create partition function pfMonthly (datetime)
as RANGE RIGHT for values('2012-01-01','2012-02-01','2012-03-01','2012-04-01')

CREATE PARTITION SCHEME pfMonthly as partition pfMonthly all
to ([primary])

One can use the following objects to check the partition and the range of values for partition:

select * from sys.partition_range_values
where function_id in (select function_id
from sys.partition_functions
where name in ('pfMonthly'))

Tuesday, September 4, 2012

DBVisualizer-Database Management Tool

When it comes to Database Management there are lot of tools out there. Database Management is one of the key activities performed by Database Administrators, Developers and Architects (depending on the Situation). With respect to SQL Server there is SQL Server there is SQL Server Management studio, now there are times situations where one need to connecto to multiple data sources. This would hold true when one is working on ETL/DataWarehosuing/Data Sourcing projects. One of the tools I recently started using (since I am working on  multiple Data Sourcing projects) is Database Visualizer. This product is available from, DbVis Software. In the web site mentioned before there is a DBVisualizer Personal Edition Download available for 21 days free. The Installation of this software is pretty straightforward, one of the activities that need to be completed after installing is to make sure the drivers for the data sources are installed as well. When the software is launched we get the following screen:

In order to create a database connection , click the + icon on the toolbar, then we have screen to enter the database connection information. One of the options available is Database type as seen in the screen below.

When it comes to Database type there are bunch of options available like the ones listed below:  there are also options for SQL Server as well.

Once the database connection has been created, there options available for SQL, Scripts and other Database Management functions. One of the features that appeals to me is wide variety of connectivity to different data Sources.

Thursday, August 30, 2012

Cryptography-SQL Server 2012

During the past few years with increasing attacks on systems and online portals, one of the most talked about areas has been cybersecurity and how to secure systems of all kinds. There have been regulations passed in different industries which call for increased compliance and state of the art security mechanisms.
Security is a vast field, one of the area which I would like to focus is on Cryptography. The area of cryptography deals with the practice of techniques for secure communication when there are presence of adversaries(folks who try to thwart secure communications/exchange of information). In the area of databases cryptography is gaining a lot of importance. SQL Server 2012 has options to add cryptography providers. There are different topics in the area of cryptography within SQL Server, each topic/category provides a level of encryption for the data. The mechanisms available in SQL Server are:

Transact-SQL functions
Asymmetric keys
Symmetric keys
Transparent Data Encryption

Each of the above mechanisms offer varying level of encryption. Asymmetric keys offer a higher level of encryption than symmetric keys. In SQL Server, the cryptographic providers can be found in SSMS under the Security option in Object Explorer.

There are different functions available which can be found on Books Online Such as


In the above example an Encryption is done by AES 256 algorithm and and encrypts the key with the certificate Purchase04. Similarly there are other functions to create a asymmetric key and certificates.
The above options provide mechanisms to have more secure databases.

Thursday, August 23, 2012

SQL Server 2008- RAID Concepts...

One of the concepts that is relevant when it comes to Database Storage is how is the data organized on disks. In today's world we have SAN's which are very powerful , I am pretty sure lot of advances have been made in the areas of storage. One of the often looked into concept is the concept of RAID levels, how one should configure database storage and what RAID level to use for the Database Servers. Here is a link  which provided a good overview of different RAID levels. I am more of developer than a Database Administrator. The link below gives us a good introduction to the RAID concepts.

Friday, August 17, 2012

SQL Server 2012-Generating Scripts...

One of the tasks that is often done in a database development environment is the scripting of database and using it for created test databases. Typically if one were script out the tables in the database in the SSMS, one would right click on the database and choose generate and publish scripts. In SQL Server 2012 one of the things i noticed was that when you right click on the database and choose tasks, there are some new options added. For the database scripts, right click on Generate scripts, a response window like the following opens up. In the first screen one chooses the objects that need to be scripted, in the second screen it says Scripting options on the side.

Set Scripting Options Window, Click Advance Button to Set Scripting Options.

When you observe initially you see options where the file needs to be saved and scripts need to be published. I was looking for where to set the scripting options, then i clicked on the advanced button,

Scripting Options Window

 the scripting options window popped up. In the scripting options one of the choices is Script for Server version there are a lot more choices. The choices range from SQL Server 2000 to SQL Server 2012, it includes SQL Server 2008 and SQL Server 2008 R2. This is option is very effective tool to create test databases and also there is an option generate scripts for the data as well.

Monday, August 13, 2012

MERGE-SQL Server 2008

In SQL Server 2008 ETL operations one of the common situations that is handled is the requirement for capturing change record information or what is more colloquially called  as Deltas. One of the common requirement is say on first day of a production ETL process records come in into a SQL Server Database they are inserted. Then from second day onwards the process needs to look for new records that have come in and also records which have changed from day one. In case of records that have changed from day one, we need to update the existing record and add the new changed record into the table. This type of scenarios can be handled by the MERGE statement available in 2008. For example there are two tables say:
Contact and Contact_Master. The Contact table gets truncated everyday and records are populated, the contact_master table has all the records which includes new and updated records. The users access the Contact_Master table through views in order to get the information based on a date range. The following command would help us insert new and modified records into the master table.

INSERT INTO contact_master (contactid,contactname,activebitflag,createddate)

SELECT contactid,contactname,activebitflag,createddate FROM
( MERGE contact_master S
USING contact L
ON S.contactid=L.contactid
WHEN Matched
AND S.activebitflag=1
THEN UPDATE SET S.activebitflag=0
THEN INSERT (contactid,contactname,activebitflag,createddate)
VALUES (L.contactid,L.contactname,1,GETDATE())
OUTPUT $action,1 AS activebitflag,L.contactid,L.contactname,GETDATE() as createddate
) AS ActiveInserts (Action,activebitflag,contactid,contactname,createddate)

There are different parts to the SQL statement: The Merge statement has a MATCHED section which handles updates of records matched between contact and contact master. The NOT MATCHED section handles the insert of new records.There is OUTPUT clause in MERGE Statement that can be useful, there is a variable called $action which holds the keywords such as INSERT,UPDATE and DELETE, this can be very useful especially when one wants to know how many records got inserted.updated and deleted as part of the Merge Statement. In the above statement, the results of the merge are captured in a derived table. In the derived table all the records which were flagged as UPDATE as part of initial merge, get inserted as new records into the table, so a record which already exists gets updated and also gets inserted with a different flag. This is one of the strategies used to capture delta information.

Monday, August 6, 2012

SQL Server 2012-Execution Plans

One of the most often used and very valuable tool available for SQL Server Developers is the Show Execution Plan feature available in SSMS. The execution plan would be used every developer in order to understand the execution plans generated by the queries and more importantly is used for performing troubleshooting on queries that are not performing well. In SQL Server 2012 there is a new hint which is available in the Execution Plan. In order to demonstrate the concept I created a simple table called National.
Here is the structure/script of the table created.

CREATE TABLE [dbo].[National]
[NationalID]    Varchar(50) NULL,
[EntityID] [int] NULL

Once the table was created I inserted some sample values into the table. Once this was complete I was ready ro run some queries and look at the execution plan. The first query I ran was

SELECT [NationalID],[EntityID] FROM [dbo].[National]
WHERE [NationalID]='1'

When I ran the above query there was nothing abnormal with the execution plan, kind of saw what was expected.

I changed the above query to test the NationalID by passing a integer value, so i modified the query and ran it as

SELECT [NationalID],[EntityID] FROM [dbo].[National]
WHERE [NationalID]=1

When I looked at the execution plan there was  one thing I noticed, there was an "!" point next to the SELECT icon with COST 0%. I looked at the "!" and had a message saying Type Conversion in Expression.

meaning there was an implicit conversion performed since NationalID is defined as VARCHAR and this may affect "Cardinality Estimate in query plan cache. This is very interesting as if this were a bigger table with large number of rows such implicit conversions could cause performance issues.

Friday, August 3, 2012

SSISDB Catalog-SSIS 2012

There have been lot of changes made in SSIS 2012, some of which I have blogged earlier. One of the key areas of change has been the deployment section. When a Integration Services 2012 is configured along with the Integration Server, there is a new Catalog which is the key point for getting information about the SSIS packages deployed to the Integration Server. There are several categories of SSIS packages which have been addressed in the SSISDB catalog. This is different from the system tables that are available in MSDB database. The focus of the SSIDB catalog is to enable the administrators to handle the SSIS packages better. Let us focus on the different tools available to focus on categories of troubleshooting.
In case one needs to get information about the performance and execution information of SSIS package, we have the following catalogs that can be used.


All of the above objects provide vital information about an SSIS package. Currently SSIS is being used in lot of big ETL operations, so getting information about the packages in terms of execution speed is very vital. As always performance of ETL is very critical when it comes to data availability. The other area of troubleshooting category is the ability to Add,Query and remove data taps in a SSIS package Data flow.
Here are the objects that can be used.


In Summary the addition of the SSISDB datalog to SSIS 2012 provides a lot of productive tools in order to administer SSIS packages.

Tuesday, July 31, 2012

SSIS Parallel Processing

In SSIS one of the key metric that is observed is how well the SSIS packages are performing. The ETL operations become critical to the the availability of data marts/data warehouse which are used for different types of reporting by Business. One of the package level property that is available in SSIS is MaxConcurrentExecutables, please see image below:

Sam Vanga , a MS BI Expert has explained this property in further detail along with some illustrations. The value of -1 is the default setting, it should not be altered without fully understanding the impact of the value.
The link to the artile is here:
Hope this article is useful for developers working with SSIS packages.

Tuesday, July 24, 2012

SQL Server BI...SSAS

BI in MS SQL Server is growing at a very rapid pace, at the same time there is a lot of expertise that can be shared among professionals. SQL Server Analysis Services is still the core product for doing the traditional BI and there are customers who use SSAS is the traditional way compared to the upcoming self service/Ad hoc BI solutions. One of the questions which was recently asked in a Q&A Setting was that How would one deploy a sustainable/reliable cube, meaning a SSAS cube with minimal downtime. Here the emphasis is on what kind of server architecture would be used to achieve the above scenario. In order to answer/handle this scenario, one should have really hands on experience deploying SSAS cubes in highly visible/Sensitive environment. One of the solution suggested was that 1) One needs to take advantage of the core processors on the server 2) Since SSAS cubes are predominantly read only, one could Load Balance the servers on the which the cubes are deployed on so that there is minimal downtime. One can also refer to the following link for tuning the SSAS 2008 cubes.

Tuesday, July 17, 2012

SQL Server-Results in Grid

In SQL Server Management studios while running queries, the results can displayed in text format or for better formatting the results can be displayed in Grid format. When the results are displayed in grid format the column headers with the column values are displayed. The results can be saved to .csv file/excel, let us say in case the columns need to moved in order of appearance, say the the 10th column in the result set need to be next to the first column in the result set, this can be in done in the Results in Grid Format. In order to swap the columns,
1. Click on the column header(say 10th column in the result set) that need to be moved, keep the left mouse button pressed, this would be change the background of the column header to be suppressed.

Please refer to the notes in the image where in I have used an actual example.

2. With the left mouse button pressed drag the column header and start moving it to the column header where it needs to be placed.
3. When the column header which is being moved is on top of the column header(the current 2nd column header) where it needs to be placed, release the left mouse button.
4. This would complete the movement of the column header, now the 10th column would be in the second place, the current second column would be moved by one to the third place.

Thursday, July 12, 2012

SQL Server - SSDT(Data Tools)

One of the tools I recently came across was SSDT for SQL Server. SSDT stands for SQL Server Data Tools. This tool once downloaded and installed integrates with the VS 2010 Professional environment. It is powerful tool which provides lot of database related functionality and extends the current database features available within VS. In one sense it is like providing SSMS type functionality from VS. That being said the question might arise what would be reason to have two feature similar tools SSMS and SSDT. There are some features which stand out in SSDT. One of the feature available in SSDT is Compare Schemas, this would allow one to compare database schemas, this would be very handy when one has Dev,QA,UAT and Production environments. This would provide the database admin/developer an ability to maintain consistency of schemas across all environments. The other feature which i liked was the ability to script out data from a table. This is very useful when there are certain tables that need to populated with base data in a new installation or a testing environment. The interface of SSDT is very similar to the SSMS interface the difference being the options which are available. One of the Pre-requisites when installing on VS 2010 is that SP1 of VS 2010 should be installed. There is a very good website on msdn which has all the information needed:
Microsoft SQL Server Data Tools ...

In the above site there is also an article related to Getting Started with LocalDB Debugging using SSDT

where in the concept of LocalDB is explained.
In Summary SSDT is a good tool for Database/BI folks who are hands-on while working with VS2010, in the future I will explore the options on this tool and writeup some blog posts.

Monday, July 9, 2012

SQL Server-Career Perspective...

Technology today is evolving at a very fast rate wherein it takes a good bit of work to put arms around it. Growth and advances in technology is mutlifold spreading across different aspects. SQL Server is no exception from what was a pure DB product to today with multitude of offerings where in expertise in a specific area of sql server demands a lot of time and effort. With this in the background how does one shape up a career, more so for experienced folks how do they relevant and when is the right time to give up your job and move on. I came across this article by Brent Ozar who is a SQL Server expert and writes amazing articles. The article is titled "Why I quit my jobs"
This would be a article to gain a perspective while I try to research my next blog write up.

Monday, July 2, 2012

SQL Server-BI Questions...

Recently I happened to attend a interview for a BI Position which was a based on a Consulting type of role. The questions spanned from SSIS,SQL,SSRS and SSAS. Of course if there was a an expsoure to the newer features such as PowerPivot,Powerview and use of sharepoint integrated mode that would have opened up an other set of questions. Here is a list of questions which were asked:
1. The first questions was around slowly changing dimensions, what are the the different types of SCD's. How is each type of SCD implemented from a schema standpoint. In SCD type 2 scenario how would one locate the most recent record. This was a pretty easy question to answer, of course more real project implementations help significantly.
2. What kind of methodology was used whil designing/building data marts/Data Warehouses (Inmon or Kimball), depending on the answer the next set of questions were asked.
3. What is a snowflake schema, how is it designed in SSAS, give an example, aslo differentiate between a snowflake schema and a parent child dimension. Here a understanding of snowflake schema and practical experience of implementing such dimensions in SSAS would help. Also implementing a parent child relation ship in SSAS would help as well, here with respect to parent child (it is about performing a self join on the table).
4. What is the purpose of Lookup task in SSIS, give a scenario describing where a Lookup task would work well, also provide a contrasting scenario where Lookup task would not work well.
5. While peforming fact table load thorugh a ETL process how would one perform Dimension lookups for the existing of valid keys, also in case of late arriving dimension data, how would the related fact data be handled.(Here one of the options is how to use UNKNOWN attribute which could be part of a dimensions).
6. How are Aggregations designed in SSAS, what is the default behavior, if one were to selectively build aggregations how would one do it. (here an understanding of SSAS cube architecture, how aggregations are performed is very important).
7. Compare and Contrast MOLAP vs ROLAP, what are your personal obervations regarding performance such storage in SSAS, here building cubes in these two storage modes, understanding how load,processing and retrieval of data from such cubes is important. This focuses on how cube is built kind of behind the scenes in SSAS.
8. How is security set up for a cude, in case security needs to setup for a dimension how is it performed, what is importance of having a role.
9. What different kinds of SSRS implementations are available. What different kind of reporting work were performed by SSRS (relational/SSAS cube reporting). In case of SSAS cube reporting how was the MDX created, here understanding of MDX very important.
10. Was TFS used in yourprojects, for what purpose was TFS used and was it used for SSIS,SSRS and SSAS projects.
I have provided  as much info about the questions that was possible. I feel all the questions focus on understanding of how things work, why such approaches are being adopted and what are the best practices. In order to tackle these a good hand on implementations and challenges faced would really enhance the responses and provide details as needed.

Wednesday, June 27, 2012

SQL Server-Dynamic Mgmt Views...

While working on Sql Server development projects, one of the tasks that is constantly performed is performance tuning of the queries. There are lot of tools/techniques available in sql server to perform this task. Since 2005 with the arrival of dynamic management views, the tracking of the queries that are being run on a sql server database has become a little more easier. One could use the DMV's to find out the totalcpu,totalduration and totalreads consumed by a query. The DMV's that can be used for the above purpose are:



In order to get the actual text and the query plan for a particular row present in sys.dm_exec_query_stats, one has to use the CROSS APPLY operator and pass a parameter to the sys.dm_exec_sql_text, sys.dm_exec_query_plan views. For example it would be something like:

select query_plan,
substring(stext.text, (stat.statement_start_offset/2)+1
, ((case stat.statement_end_offset when -1 then datalength(stext.text)
else stat.statement_end_offset
end - stat.statement_start_offset)/2) + 1) as txt
from sys.dm_exec_query_stats as stat
cross apply sys.dm_exec_sql_text(stat.sql_handle) as stext
cross apply sys.dm_exec_query_plan (stat.plan_handle) as qplan

In above code segment, the sql_handle value is passed as a parameter dm_exec_sql_text to get the actual sql text, and the plan_handle is passed to get the actual query plan. The query plan is displayed with a hyperlink, clicking on the link will show the execution plan of the query.

Tuesday, June 19, 2012

SSIS 2012-Data Viewer

As I am exploring SSIS 2012, one of the features which I have used in SSIS is the data viewer functionality. It is one of the ways to find out what data is passing through the data flow task. It gives an idea about the data that is being extracted from source before being committed into the destination table. This feature is useful to get an idea about the data like profiling the data that is being sourced. In SSIS 2008, in order to enable the data viewer, within the data flow task one had to right click on the flow connecting the source and destination, choose the Data viewers option and then choose Add from the Data flow path editor. In SSIS 2012 enabling data viewer has been made simpler. In the data flow task, right click on the path connecting the source and destination, then click on the Enable Data viewer option as illustrated below.

Once that is done, click the Edit option to edit the columns that need to displayed in the data viewer. In case the data viewer is not added one can disable the option by following the image shown below. Choose the Disable Data viewer option.

In Summary in this blog post I wanted to show case the data viewer option in SSIS 2012.

Monday, June 18, 2012

SQL Server Questions...

One of the most sought after topics is SQL Server Interview questions. All of us who are SQL Server Professionals would have gone through different interviews and would have had different experiences. No matter how senior the positions are in SQL Server there are certain concepts which would have to be understood for a person gaining an entry into SQL Server. Here is a article written on SQL Server central which gives out a set of sql server questions.
The questions in the above article addresses Database Administration area and Database Design. One of the questions goes like are the operations on a Table Variable Logged.

Thursday, June 14, 2012

SQL Server-Tuning

Recently I ran into an issue with a SSIS package. One of the Exceute Sql task was insert records into a table from a view on a sql server database. In the last few days the package started failing frequently and it was on the execute sql task inserting records, the error reported in the job logs was a network error. I started examining the source view and looked at the select statement of the view. I noticed somethings write way: The view was using other views, each view in turn were referencing 4-5 base tables,also there was one view which was used repeatedly in the joins. When i ran the Select statement from the view it took over 20 mins to bring the data back. I knew that nested views could be causing the slowdown, analysed the join and indexes. I took the view which was used again and again the joins, converted them into a CTE of its own. Once the CTE was created the data was inserted into a temp table. Once this was completed, the remainder of the view was converted into another CTE. Then i combined the temp table created earlier with the CTE and ran the query. The data was retrieved in less than 2 minutes which was a huge performance gain from earlier. One of the takeaways from this exercise was that when using a VIEW as a source, the underlying SQL of the view needs to be examined. In case there is nesting of views, there could be a different way to model the data that is needed. To summarize this was a very interesting exercise of performance tuning.

Wednesday, June 13, 2012

SSIS 2012-Project Deployment

In my previous blog post, i wrote about the different deployment models available in SSIS 2012. In continuation with the previous article, the Project deployment model uses Parameters. In fact with the Project deployment model Parameters become of paramount importance. Typically in Project deployment model parameters are used to assign package properties. A project usually contains packages and parameters and is built into a project deployment file (.ispac extension). As per MSDN/Books Online
A parameter specifies the data that will be used by a package. You can scope parameters to the package level or project level with package parameters and project parameters, respectively. Parameters can be used in expressions or tasks. When the project is deployed to the catalog, you can assign a literal value for each parameter or use the default value that was assigned at design time. In place of a literal value, you can also reference an environment variable. Environment variable values are resolved at the time of package execution.

At the center of the project deployment model is the project deployment file (.ispac extension). The project deployment file is a self-contained unit of deployment that includes only the essential information about the packages and parameters in the project. The project deployment file does not capture all of the information contained in the Integration Services project file (.dtproj extension). For example, additional text files that you use for writing notes are not stored in the project deployment file and thus are not deployed to the catalog.

For more differences between Project Deployment and Package Deployment, please refer to the following link.

Tuesday, June 5, 2012

SSIS 2012 - Deployment Model

In SSIS 2012 there are two deployment models One is package Deployment Model and the new one being introduced is project deployment model. In SSIS 2008 we have been working with the package deployment model. In this model once a SSIS project is created, we have a project folder under which we have sub folders like Data Sources, Data Source Views and SSIS packages. In SSIS 2012 when we create a SSIS project it is by default created in the Project Deployment Model. In the Snapshot below we see Project Params, Connection Managers , SSIS packages and Miscellaneous.

This is where the new deployment Model has it s advantages: As one can see there is no data Source folder instead we have a Connection Manager folder. In this folder all the connections used by a package can be placed. This is done by : In the connection manager tab right click on a connection and choose the Convert to Project Connection.
Once this option is chosen the connection is added to the Connection Manager folder. Now the connections available in the connection manager folder are available for other SSIS packages with in the SSIS project. This becomes very useful since if a connection has to change it needs to be done to connection once within the Connection Manager folder, then the changes are visible to all the other packages that use connection within the project. In a nut shell Project Deployment options provides flexibility while working with many packages within a SSIS project.

Monday, June 4, 2012

SQL Server 2012 - New Functions...

With the release of SQL Server 2012, there are lot of new user defined functions which have been introduced in T-SQL. One of the popular or more used functions are CAST and CONVERT, these are used to perform conversion of data from one data type to another. There is a new function which has been introduced in SQL Server 2012 called PARSE. It has the following Syntax:

PARSE ( string_value AS data_type [ USING culture ] )
PARSE(), expects three parameters,
string_value – String value to parse into the specified data type.
data_type – Return data type, numeric or datetime type
culture – A language (English, Japanese, Spanish, Danish, French etc.) which will used by SQL Server to interpret data.
SELECT PARSE(’23/04/2012′ AS datetime2 USING ‘en-GB’) AS [Using PARSE Function]

There are more functions which are described in detail in this article:

Tuesday, May 29, 2012

SQL Server 2012-Offset

When we viewing a large data set in a report the data is split across pages. This is also called pagination where a certain set of rows are displayed for the user. There are lot of techniques available to achieve such a functionality. In SQL Server 2012 there is a feature in T-SQL that would enable the user to achieve such a functionality. The feature is called OFFSET and FETCH, this allows users to build paging applications. Let me explain the use of OFFSET and FETCH with the following Example:

SELECT OrderDetailID
FROM Products
ORDER BY OrderDate

In the above Select statement, the OFFSET provides a starting row from which to display the data, FETCH provides the number of rows to displayed in a page/result set. The above statement would be for say the first page of the result set. In order to view say the next page, the OFFSET value would change from 0 to 15, for the next page it would change from 15 to 30. Now in order to make this select more general for paging, the select statement would be modified to the following:

SELECT OrderDetailID
FROM Products
ORDER BY OrderDate

The Value of a variable can be controlled through a function or a table for the application, like as when the user hits the next page the value being passed to the offset variable will change. Like how variables are being used for OFFSET, the FETCH clause can also take variables. For example:

SELECT OrderDetailID

FROM Products
ORDER BY OrderDate
FETCH NEXT @FetchRows - @Offset + 1 ROWS ONLY;

Tuesday, May 22, 2012

SQL Server 2012-T-SQL

As part of the SQL Server 2012 release there have been several enhancements. One of the most used in the database side of things is T-SQL, it has a very wide application. In oracle there is a feature called sequence which is table independent and the new value can be obtained any time as well as during UPDATE. This is a powerful feature which has been non existent till SQL Server 2008. In SQL Server 2012 the concept of sequence has been introduced. In the SQL Server Management studio one can fin sequences under programmability under the Database. It is illustrated in the image below.

For example in the user database called Features I created a sequence called seq_test where the start value is set to 1 and is incremented by 1. The syntax is given below.


The statement above will create a sequence seq_test. Now In order to retrieve sequential values from the sequence use the following statement:

SELECT NEXT VALUE FOR dbo.seq_test. This would return the value 1 since the starting value is set to 1. When the select statement is run again the value will be 2.
Now If I were use this sequence in a insert statement where in one of the values is an incremental value I could do the following: Let us assume we have a table called Employee which a id column and a name column.
INSERT INTO dbo.Employee
VALUES(NEXT VALUE FOR dbo.seq_test,'James');
In Summary this is a useful feature which can be utilised by developers in situations where a incremental type value is needed.

Wednesday, May 16, 2012

Expression Task-SSIS 2012

As I Continue to explore the features of SSIS 2012, I have noticed some new components which are available. One of the new tasks which is available in the control flow tab is the Expression task. We are all familiar with expressions which use within each component to evaluate a certain condition or to assign properties with a value. In SSIS 2012 Expression Task does the same thing as Expressions but it is in the form of a task. The Expression task is available in the control flow and one just needs to drag and drop it to the control flow. Once the task is place in the control flow, Right click on the task and choose Edit. This would open up the familiar Expressions window. One of the new things in this window is that Parameters can also be accessed along with the variables. One could check the value of a Parameter in the Expression Task and then in the workflow one can check the value of the Parameter by using expressions in the workflow. I have attached a sample Control flow which utilises the Expression Task.

In a Nutshell the Expression Task can be used to control the workflow and gives Expressions a higher visibility by being a Task. This gives Expressions a higher visibility in SSIS 2012 compared to earlier versions where they were associated with the Component itself.
One can also notice in the image above that there is a slider component on the design surface This gives the developers the ability to zoom in on the flows being developed. In cases where the SSIS flows become very complex, the slider control can be useful to focus in areas that need to understood or edited.

Friday, May 11, 2012

PowerView Environment...

Powerview as I already mentioned in my earlier posts is an important addition to SQL Server 2012.
Here is a snapshot of the design enevironment: Courtesy Introduction to SQL Server 2012 (Microsoft Press).

The Screen layout kind of resembles Report Builder 3.0 but it has additional features (As one would expect). The PowerView Tool is available in Sharepoint Server 2010 Enterprise Edition. Rather than working in design mode and then previewing the report, as you do when using Report Designer or Report Builder, you work directly with the data in the presentation layout of Power View.

You start with a tabular view of the data that you can change into various data visualizations
There are some benefits running Reporting Services in Sharepoint Mode.

Scale Reporting Services across web applications and across your SharePoint Server 2010
farms with fewer resources than possible in previous versions.

Use claims-based authentication to control access to Reporting Services reports.

Rely on SharePoint backup and recovery processes for Reporting Services content.

Monday, May 7, 2012

SQL Server 2012 (Ebook)

There are some free ebooks available from Microsoft press, some of the ebooks include SQL Server 2012 Introduction and SQL Server 2008 R2 Introduction. I downloaded the SQL Server 2012 ebook which was around 10 MB pdf file. The content was good in the sense of providing what are the new features available in SQL Server 2012, the book covers from Database Administration to Reporting services. The book is divided into two main parts Database and Administration, the Second part being Business Intelligence. The material is extensive in the sense that the content is not merely a bullet list of issues. Below is the link provided, I would like to that the for providing the links.

Hope you guys enjoy the ebooks.

Thursday, May 3, 2012

SSIS (Components Vs TSQL)

There is always a lot of debate about how to design SSIS packages. The question always is whether to use components or TSQL in SSIS packages and which would perform better. Recently a discussion in the MSBI groups in Linkedin has started off  a very good debate.
Here is the link:
SSIS - TSQL Stored Procedures Vs SSIS inbuilt transformations
There are some very good points for using SP/SQL and Components.

Tuesday, May 1, 2012

SSIS 2012

Once I had the SQL Server 2012 Bi edition installed I decided to start creating a SSIS package in SQL Server 2012. As mentioned in my previous blog post BIDS has now a new name called SQL Server Data Tools. On launching the SQL Server Data Tools one can customize the setting for Business Intelligence projects. I created a New SSIS project and created a new package. I created a new data flow task with a source has a flat file and destination as a sql server destination. In the Data flow tab I noticed some new things. One of the new things i noticed was Favorites section, this has a Source Assistant and a Destination Assistant. These
two tasks would help the developer with creating a source and destination connection. One of the other new source which i notice in the source connections was the availability of a CDC source.  The CDC source reads change data from a SQL Server  change table  making it available for processing  by downstream  data-flow components. As I build out package I will be posting blog entries detailing the new features available.

Monday, April 23, 2012

SQL Server 2012-BI Edition

I finally got to download the BI Edition of SQL Server 2012, this edition is new in SQL Server. The edition is focused on tapping the new features that have been added to the BI landscape. There were certain new features which i noticed in SQL Server 2012 BI edition. I choose to install the SQL Server DB engine,SSAS,SSRS and Integration Services. SSRS install has gone through a revision, there are new features added to SSRS sharepoint mode which is to facilitate the install and execution of Powerviewer. In SSAS the server can be configured in two modes 1. The regular multi dimensional and data mining mode. 2. The tabular mode, in this mode SSAS server would handle self service BI capabilities. Based on these two options, in case a business needs to handle both then i guess there would be two instances of SSAS one handling the regular MOLAP capability and the other Server would handle the self service piece. It is pretty evident that both the modes cannot coexsist on the same SSAS server. Integration Services and BIDS have gone thorugh a change in this edition. Int the programs under SQL Server 2012, one gets to see a new program called SQL Server Data Tools, this is the new name for BIDS. On launcching the SQL Server Data tools it opens a VS 2010 Shell and it can be customized for Business Intelligence Development. Once the VS 2010 shell has been customized and if you choose the option New Project and the project type is Integration Services Project, there are some elements which appear in the design page of SSIS package. In SSIS 2012 one of the main new features is the addition of Parameters which is available at the Project level and the Package Level. In the Solution Explorer one gets to see the Project Parameters like in the image show below. The project parameters is named Project.params.

In the main design page of the SSIS package one gets to the addition of an extra tab called Parameters.
The third tab (the one after Data flow tab is the Parameter tab) is the screen to add parameters.

Based on the initial looking around in SSIS 2012 parameters has a big role to play. As i continue to explore I will blog about the usefulness of Parameters in SSIS 2012.

Friday, April 20, 2012

PowerView Video-SQL Server 2012

SQL Server 2012 is released, the database market is flooded with alternatives, Bigdata concepts and tools are on the rise, the number of reporting tools are also on the rise. PowerView of Microsoft's big components in Data Visualisation is ready, how does it stack up, what works well in PowerView, what are areas of improvement, the following link is a video which gives a very good perspective of PowerView.
The video is interesting and important because PowerView has been evaluated on the basis of current cognitive research around Data Visualisation principles. There is a lot to cover in SQL Server 2012 a much bigger and better offering from Microsoft.

Monday, April 16, 2012

NoSQL Databases...

As the Database world evolves one of the rapidly developing space in Databases is NoSQL databases. There are a variety of implementations available. I had blogged about NoSQL Databases in an earlier blog post. Here is one I would like to share with you all, it is a database called MongoDB. There is a very detailed and excellent article post on MongoDB architecture by Ricky Ho. It is very well written and the differences between Relational and NoSQL have been highlighted.

Thursday, April 12, 2012

SSIS Connections...

One of the key aspects of working with SSIS is using Connections which are key to extracting data from data sources and populating data into destination databases. As the ETL project gets bigger, managing connections becomes a very important aspect. It can become very tedious if one is opening and closing packages as the SSIS packages are moved from one environment to another and at the same time changing connection strings. There are different configuration methods to alleviate some of the redundant work.
Andy Leonard SSIS expert recently gave a presentation on Managing SSIS connections in SSIS 2008.
He provides insight into connection managers and how to store connection strings and at the same time be compliant with the audits. Here is the link:
A very good video to get insight into connections in SSIS.

Monday, April 9, 2012

Custom SSIS Logging...

In one of my blogposts I had blogged about using the Logging Options Provided by SSIS. The options available in SSIS are good. Depending on one's needs one could develop Custom SSIS logging solution. One of the options I have utilised is to use a table in a SQL Server database, create the table with meaningful columns which capture information that would help in troubleshooting and auditing. The columns which i would like to include are source_rows, Inserted_rows,error_rows. These would enable a developer to keep track of how data is flowing through a SSIS package. The audit table for example could be named as SSIS_Package_Audit with columns such as ( UniqueId(Identity),PkgName, StartDate, EndDate, SourceRowCnt, InsertRowsCnt, ErrorRowCnt,ElapsedTime_m, LastError,ServerName,DatabaseName). When a Package starts as one of the first steps in the SSIS package, a new row is inserted into the table to indicate that the package has started. As the package progresses through different points, the audit table can be updated. In case the package encounters an error, an update can be made to the LastError Column in the table with an appropriate error text. This could tell us what error was encountered when the package was run. Inserts and Updates to the audit table can be done through a Execute SQL Task or a Script Task.
I have included a image for reference, The first script task inserts a row, the last script task actually updates the row which was inserted with values for the rowcounts and enddate, Last error column would be set to NULL. This would indicate a good run of the SSIS package. The solution being presented here is a starting point, this can be greatly enhanced to suit the business needs and dependencies of packages could be included in the table.

Thursday, April 5, 2012

SSRS 2012

With SQL Server 2012 being released there is lot of new features all across the board which are being studied and used. The mechanisms to pull data and use data in different ways is ever exploding. SQL Server Reporting Services 2008 and 2008 R2 were good releases where in SSRS started to being looked at as a reporting tool with good features and capability. SSRS 2012 has an increased set of features, Microsoft has listed the main highlights of the new product. Sharepoint is also getting lot of traction in lots of companies these days. In order to look at what is new in SSRS 2012, here is a link:
I am planning to start working with the evaluation version of SQL Server 2012 and keep abreast of what is being offered.
As a side note: I am including a nice article on how to land a job as a developer: This appeared in SQL Server Central as well.

Wednesday, April 4, 2012

Data Visualization-Part 2

As Newer software and newer versions of operating systems are getting rolled out so is the field of data visualization which is rapidly expanding. In my earlier posts I had blogged about data visualization and Powerviewer. In this blog post I would like to share new tools which have come out and which address different platforms. Computerworld magazine has come up with a article which looks into 8 Data visualization apps and summarizes the findings.
The article nicely summarizes the different operating systems on which these apps run.

Wednesday, March 28, 2012

SQL Server Certification-2012

With the SQL Server 2012 being released and making its way into the enterprise. The certifications program for SQL Server 2012 are being readied for launch. How much certifications help out has/is will be a subject of debate. Keeping debates aside, Microsoft has provided information on how to go about it. Here is the link to:
It is being recommed that folks complete certifications for SQL Server 2008 before getting into SQL Server 2012 one of the reasons being that the path will be shorter. I guess more will be know about these as the program rolls out in  June 2012 (as mentioned in site).
In case any one has completed the SQL Server 2008 BI certification, could you please share the resources in the Comments section of the blog.
Thank you

Friday, March 16, 2012


I was working on Developing a SSRS report which displays data about SSIS packages and package dependencies. To get a better perspective the report displays data about packages and its dependencies meaning which need to be completed before running a particular package. I created a stored procedure to provide data in a Master/Detail format and worked on presenting it in a SSRS report. In the report I had the detail row which was the dependencies, the main column on which the report was grouped on was the Parent/Master package column. The idea was to provide a collapsible master/detail report such that when expanding the Parent package one can see the dependent packages in the detail on the same report.
The SSRS report has a list of Package Names as a parameter with Multi Select enabled. By default the report runs for all the values and displays the parent Package and it s dependencies. The issue was that the
'+' sign was appearing before the Parent Package Value and also the details were shown. Ideally we should have the '-' sign appearing before the Parent Package name since all the details are show when the report is run initially. In order to handle this scenario I used a Property which is available for a column in the report, it is called the InitialToggleState Property. By default this property is to false, this was making the '+' sign appear before the column data in the report. I set it to True and ran the report, this displayed the data with the '-' sign in front of the Parent Package and the details were displayed in the same report.

Below is an image with the Report result: