Tuesday, May 13, 2014

Data Analytics-SQL Server...

One of the more popular term that is doing rounds these days is the term called Data Scientists. This seems to have emerged alongside the growth of Big Data. As part of the Data scientists role it is expected to have a good knowledge of statistical tools and data modelling concepts using mathematics plus exposure tools like SAS,SPSS and R. One of the tools i came across recently which is being used in the Big Data world is R. Today I had the opportunity to attend a session organized by PASS on SQL Server and R, this was presented by Tomaz Kastrun. The session focused on how  R can be used with SQL Server to perform Data Analysis. As per Tomaz:
"The Language R  is a powerful language for  data analysis  with all the great features  for data import from SQL Server Environment".  The Language R along with SQL Server data would be a powerful tool for data scientists to use wide range of statistics. There are statistical functions in SQL Server as part of T-SQL but they tend to be more of the standard/simplistic statistical functions. In case we have to use more complex statistical functions then the T-SQL Code would get longer and more complex which would lead to maintainability issues. This is where tools like R Language come in which can be used with SQL Server to produce very effective results. Here is a definition/purpose of R-Language (which was presented as par of the topic).

In order for the SQL Server Integration with R, one would need the following:
SQL Server
R-Language (needs to be installed) - This is available at the site: http://www.rstudio.com
Data (of Course), it would be good to cleanse the data before using for analysis with R.

The topic presented shows different opportunities that exists to leverage current tools like SQL Server and combine them with a powerful statistical language like R. This provides different analytical capabilities for the Data Scientists/Developers.

Wednesday, May 7, 2014

SQL Server 2014 and Tempdb...

One of the challenges faced in SQL Server development is how to handle tempdb and make sure that temp tables are not causing excessive disk i/o. Then came table variables and there have been lot of posts about which is better (table Variables vs temp tables). Now in SQL Server 2014 there has been some kind of improvement for tempdb. Please read the blog post below from SQL Server Expert Kendra Little:
http://www.brentozar.com/archive/2014/04/memory-ssd-tempdb-temp-table-sql-2014/
One thing to note is that tests were run on machines which had storage on SSD. The above blog article is built upon the original blog post listed below:
http://blogs.msdn.com/b/psssql/archive/2014/04/09/sql-server-2014-tempdb-hidden-performance-gem.aspx. To quote from the original blog article:
"Prior to the SQL Server 2014 change the select into may have written all the pages accumulated to disk. The SQL Server 2014, eager write behavior, no longer forces these pages to disk as quickly as previous versions".