Sunday, October 20, 2013

SQL Saturday - ETL,CDC and Hadoop

Finally the day arrived and I was excited to attend the SQL Saturday Event in Charlotte on October 19th. The arrangements were very well done, really liked the idea of Speedpass to make the registration process. There were a good number of vendors from product companies to consulting firms. The schedule was pretty impressive in the sense that it was broad range of topics, My first session was about ETL and ELT by Paul Rizza. The topic covered a wide range of topics in SSIS with respect to the pitfalls that one need to watch out for, the tasks and logic in SSIS which can drag performance. The interesting property about how to indicate that the records are sorted at source and how this would help prevent SSIS from performing sort before a Merge Operation. The property sort at source is indicated in the source connection advnaced properties. There was an another interesting discussion about the RetainSameConnection property. This property works well except in cases of parallel execution where the same connection cannot be utilised. There were interesting tidbits about the SSISDB catalog what to watch out for when moving between servers. There was a good discussion about Raw file types which are storngly typed. One of the biggest advantages using the Raw file type source is that the files size are a lot smaller and the strongly typed nature of the files helps SQL Server not perform data type validation before doing a bulk insert. This was an neat feature for me. The performance issue with For Each Loop Container was discussed in detail, especially when one used a For Each Loop container and uses Execute SQL Task within, how many logging operations happen. One of the good replacements for a For Each loop container would be a Data flow task in SSIS. There was a lot of good discussion about memory usage by SSIS, also a look into the Lookup tasks especially with respect to Partial Cache and Partial No Cache. The pitfalls of using command destinations and user defined functions. The session was filled with interesting tidbits about SSIS.
The second session was about Hadoop, with hadoop gaining a lot traction, the session was fully packed. The presentation by Rick Hallihan and Bill Carroll  was very informative, the session began with an overview of Hadoop, MapReduce and the tools available in the Hadoop universe, then the topic move on to How Microsoft is adopting Hadoop through HD insight collaboration with Hortenworks. There was a word count example demonstrated in Hadoop and then the same repeated with a C# .Net example within visual Studio 2013. There are class libraries present in .NET which support the Hadoop Mapreduce functions. The main part of the session was to create Hadoop cluster within Windows Azure using HDInsight. The Hadoop cluster was set up in windows azure in 4 clicks, the cluster had 4 nodes. There were comparisons made between the SQL and Big data world, overall a very impressive session.
The third session i attended was about CDC (Change Data Capture) present in SQL Server 2008. The CDC feature is very useful for auditing, datawarehouse especially SCD build out. The change between CDC and CT (Change Tracking) is that with Change tracking feature only the row change information is captured and there is no history maintained with respect to Change Tracking feature. CDC will work on tables that have Primary keys only and the source database has to be a SQL Server Database. There was good example demonstrated where there was a delete of records made on a table which is enabled for CDC by mistake and how using the CDC tracking table one can recover the deleted records, in case CDC feature was not there then this would be a perfect storm. The session was presented by Jose Chinchilla (http://www.sqljoe.com).

Monday, October 14, 2013

SQL Saturday-Charlotte

SQL Saturday are a very good way to keep up with happenings in the sql server world. The sessions have very good speakers covering a wide variety of topics. The SQL Saturday scheduled for October 19th in Matthews (Suburb of Charlotte) is focused on SQL Server BI Stack, with topics including Hadoop & Big data. The event promises to be very exciting with a wide range of topics and speakers.
Please refer to the links below for more information on the session and the schedule of topics. Topics include:
ETL, Data Visualization, Self-Service BI, Data Quality, Master Data Management, OLAP, Big Data, SharePoint, Data Management

SQL Server Saturday: http://www.sqlsaturday.com/237/eventhome.aspx
Schedule Link: http://www.sqlsaturday.com/237/schedule.aspx

Thursday, October 3, 2013

Gartner Report - BI Tools

It is always interesting to read the different reports which cover the BI space which is growing at a enormous space with vendors coming with different capabilities and covering different aspects plus the Cloud Based Offerings. Here is a link to the Gartner report which discusses the BI quadrant for 2013. Some quotes from this report:

Microsoft leads in ability to execute
As I did with the DW report, I'll start with the "winners" in this MQ. Each axis ("ability to execute" and "completeness of vision") had its own winner, and that for the former is Microsoft. I tend to go out of my way to point out my bias on Microsoft. I've long thought the MS BI stack provided significant value, and have watched that value increase recently with the introduction of PowerPivot and SQL Server Analysis Services Tabular mode (in-memory column store databases that are highly integrated with Excel and SharePoint) and Power View (interactive visualization technology integrated with those same products). Microsoft has also recently added Master Data Management and Data Quality tools that, while still maturing, round out very nicely a stack that costs little to nothing for customers already using SQL Server, Office and SharePoint.
 Please read the full report below:
http://www.zdnet.com/gartner-releases-2013-bi-magic-quadrant-7000011264/