Wednesday, July 16, 2014

Nice to Have Books...

With growing online content, downloadble books available for business intelligence, data warehousing, it is always go to have books handy on one's book shelf. When one has to read through several pages sometimes a book becomes very handy. Here is a blog written by SQL Server expert James Serra on what would be the nice to have books, it covers the areas of Data warehouse, Business Intelligence, Master Data Management and Consulting. Please refer to the link below: the list is pretty exhaustive.
http://www.sqlservercentral.com/blogs/jamesserra/2014/07/16/it-books-that-should-be-on-your-shelf/
Some of the books which I have are Ralph Kimball book on data warehousing and Microsoft Reporting Services 2012.

Monday, June 30, 2014

Renaming SYSSSISLOG table...

In SSIS we have different ways of logging the progress of the SSIS package, these can be tracked in files or can be done with a table. One of the popular ways of auditing the SSIS package is using a SQL Server table. The table that is used is dbo.sysssislog table, this table is created by default when choosing the Logging of SSIS package to be done via SQL Server. There was a comment/question by one of the readers of my blog, as to whether the table can be renamed into something more meaningful. The answer is yes, here are the steps listed: The one thing to watch out for is to modify the sp_ssis_addlogentry stored procedure to point to the new renamed audit table.

1. Click the table dbo.sysssislog in the database that has the table.
2. Right click on the table and rename the table to what the new name should be.
3. In the SQL Server Management studio expand programmability and choose Stored procedures.
4. Right click on the sp_ssis_addlogentry stroed procedure and choose modify.
5. Replace the table dbo.sysssislog in the stored procedure with the new renamed table.
6. Press F5 to recompile the stored procedure.
7. This audit table is present in the SQL Server 2008 and SQL Server 2012 database systems.
8. In SQL Server 2005 databases, the table is called sysdtslog90.
9. The stored procedure is called sp_dts_addlogentry, it would be in the stored procedures folder in SSMS.




Tuesday, June 3, 2014

Hadoop-DMXExpress

Hadoop is definitely making its presence felt in big organizations, it is being used as a complement to the existing traditional BI investments made by the companies. One of the projects which i am following, there is a very measured approach to getting hadoop into the ecosystem and making sure there is good value proposition for the company. Understanding the concepts involved with Hadoop and the infrastructure that goes with it will provide a good understanding of the concepts. There has been lot of talk about folks need to know java to work with hadoop. In certain cases finding such resources can cost a lot of money in the sense good java resources are expensive. Anyways I don't want to get into that debate. One of the key functions that is involved working with hadoop/Big data is ETL. We still need to be able to move the data from the HDFS/map reduce outputs to say other traditional data warehouses, basically perform data integration. One of the tools that is available in the market today is from a vendor called Syncsort (http://www.syncsort.com). They have a tool called DMX Express which is very useful for performing ETL/Data integration functions. As per the vendors website, DMX Express "Accelerates Data Integration performance, eliminate manual coding, and reduce total cost of ownership to capitalize on the opportunities of Big Data". There is another tool called DMX-h, this tool as per the vendor:"From comprehensive connectivity and optimized sort capabilities, to point-and-click interfaces for developing MapReduce jobs, DMX-h provides everything organizations need to successfully deploy ETL in Hadoop. No coding, no scripting, no tuning; just smarter data integration". I had the opportunity to see the demo of DMX Express the user interface of the tool looks similar to the tasks we have in SSIS. Folks who have worked with SSIS should find the transition a lot easier. Also as per the person who was doing the demo there is no prerequisite of knowing Java, the tool takes care of what has been done by writing Java classes.
DMX Express link: http://www.syncsort.com/en/Products/DMX/DMX-Overview.

Tuesday, May 13, 2014

Data Analytics-SQL Server...

One of the more popular term that is doing rounds these days is the term called Data Scientists. This seems to have emerged alongside the growth of Big Data. As part of the Data scientists role it is expected to have a good knowledge of statistical tools and data modelling concepts using mathematics plus exposure tools like SAS,SPSS and R. One of the tools i came across recently which is being used in the Big Data world is R. Today I had the opportunity to attend a session organized by PASS on SQL Server and R, this was presented by Tomaz Kastrun. The session focused on how  R can be used with SQL Server to perform Data Analysis. As per Tomaz:
"The Language R  is a powerful language for  data analysis  with all the great features  for data import from SQL Server Environment".  The Language R along with SQL Server data would be a powerful tool for data scientists to use wide range of statistics. There are statistical functions in SQL Server as part of T-SQL but they tend to be more of the standard/simplistic statistical functions. In case we have to use more complex statistical functions then the T-SQL Code would get longer and more complex which would lead to maintainability issues. This is where tools like R Language come in which can be used with SQL Server to produce very effective results. Here is a definition/purpose of R-Language (which was presented as par of the topic).

In order for the SQL Server Integration with R, one would need the following:
SQL Server
R-Language (needs to be installed) - This is available at the site: http://www.rstudio.com
Data (of Course), it would be good to cleanse the data before using for analysis with R.

The topic presented shows different opportunities that exists to leverage current tools like SQL Server and combine them with a powerful statistical language like R. This provides different analytical capabilities for the Data Scientists/Developers.

Wednesday, May 7, 2014

SQL Server 2014 and Tempdb...

One of the challenges faced in SQL Server development is how to handle tempdb and make sure that temp tables are not causing excessive disk i/o. Then came table variables and there have been lot of posts about which is better (table Variables vs temp tables). Now in SQL Server 2014 there has been some kind of improvement for tempdb. Please read the blog post below from SQL Server Expert Kendra Little:
http://www.brentozar.com/archive/2014/04/memory-ssd-tempdb-temp-table-sql-2014/
One thing to note is that tests were run on machines which had storage on SSD. The above blog article is built upon the original blog post listed below:
http://blogs.msdn.com/b/psssql/archive/2014/04/09/sql-server-2014-tempdb-hidden-performance-gem.aspx. To quote from the original blog article:
"Prior to the SQL Server 2014 change the select into may have written all the pages accumulated to disk. The SQL Server 2014, eager write behavior, no longer forces these pages to disk as quickly as previous versions".

Wednesday, April 23, 2014

Data Virtualization...

With companies/business having different types of  data marts/datawarehouses there is a fear that these could become independent silos and could not provide value to the business as a whole. There has been lot of resources(Time,Money,People) being invested in building a datawarehouse/data marts. The business would like to get value from the disparate datawarehouses/data marts, so the concept that has been in play for quite a while is Data Virtualization. Data Virtualization could be defined as process of building a agile data layer for easy data access and delivery. This is also sometimes referred to as the data abstraction layer so that users can get access to data quickly. Having defined Data Virtualization, where does one use this concept: Here are some of the situations where Data Virtualization could be used.
1. Data Federation - Scenario: A application requires data from multiple incompatible data sources.
(Example: Federated Views, Virtual Data Marts)
2. Data Warehouse Extension: The Datawarehouse does not contain the required data to create reports (Example: Datewarehouse Extension)
3. Data Virtualization - Build a Agile Data layer for easy data access.
4. Big Data Integration: How to combine Big Data with traditional data for analysis (Example: Hadoop)
5. Cloud Data Integration: Need to integrate systems on site with the Applications running in the cloud (Example: SaaS Application Integration)

Based on the scenario and type of question we are trying to answer, Data Virtualization could be a solution. The concepts described here were based on a article related to the Composite , a data virtualization/Integration tool from Cisco (http://www.compositesw.com/data-virtualization/).

Wednesday, April 16, 2014

SQL Server 2014-EBook

With SQL Server 2014 being released, there are lot of lectures/talks being organised by SQL PASS for folks who want to get a understanding of what is available in SQL Server 2014. There is a free ebook written by Ross Mistry and Stacia Misner on SQL Server 2014. The eBook is titled "Introducing Microsoft SQL Server 2014 Overview". The ebook is around 116 pages and contains primarily 2 Parts: 1. Database Administration, Part 2: Business Intelligence Development.
The Part 1 has 3 Chapters, these chapters highlight the 1. SQL Server 2014 Engine Enhancements, 2. In-Memory Features, 3. High Availability, Hybrid Cloud and backup enhancements.
The Part 2 Focuses on the BI related enahnacements available in SQL Server 2014.  The 3 Main Areas focused on are: Chapter 4 on Self Service BI, Chapter 5: on Power BI, Chapter 6: Big Data Solutions. All of the chapters describe the future BI solutions that are going to be available. The Chapter 6 dwells into the Microsoft Big Data Framework HDInsight. Here is the link for the eBook:
http://www.microsoftvirtualacademy.com/ebooks