Thursday, January 29, 2015

Power BI-New Version...

Microsoft is really pushing into the cloud space in a big way, making a lot of investments in the cloud and also related development tools. One of the aspects of cloud that was discussed in the Microsoft earnings report was about the self service BI tools especially related to Power BI. Power BI was a set of tools released by Microsoft to enable users to perform self service BI operations mainly aimed at the power users/systems analyst/financial analyst user base. It was different from the traditional data warehousing/BI tools that Microsoft has. With the recent release of Power BI Microsoft has decided to open up the door in a big way and allows users to play around with Power BI. On going to the web site http://www.powerbi.com, one is asked to enter a work email address and sign up for access. Once you setup an account, one gets to log in to the Power BI public preview page where one can see sample dashboards and charts, also there s a start up video for first time users who would like to go through the capabilities. Please refer to the screen below which shows how it looks when you first log into Power BI Public Review.

There are capabilities to create Reports, DataSets and DashBoards. When you click on the Get Data Option you are presented with the following options:
 The user gets a variety of option to play around with , it is interesting to see that it has connections to repositories like Github. The main purpose of this release seems to be get this in the hands of as many users as possible so that the features can be explored and analyzed, with that also get some serious interest generated around Power BI. Hope this blog article helps in exploring the Power BI and its features.

Monday, January 5, 2015

Happy New Year 2015...

Wishing all the readers of my blog a happy and prosperous New Year 2015. In this year one can see how the big data technologies mature or adopted within organizations. What happens to predictive analytics/machine learning type of technologies, how well they come into mainstream organizations. How is the role of data scientists going to evolve, along side with the traditional data warehouse technologies. There is a very good introduction course on The Data Scientist's toolbox that covers the foundation required for Data analysis and Data Scientists role.www.coursera.org.

Monday, November 3, 2014

SQL: Counting records...

One of the most commonly used tasks in SQL is related to counting records, as the data grows, counting records could become an expensive operation. Folks need to be aware of the environments where counts are being taken, especially while running in production. Performing counts in production could prove to be expensive. In the link given in the blog article below, SQL Server Expert/Writer Aaron Bertrand explores some of the techniques for doing counts and it s advantages/disadvantages:
http://sqlperformance.com/2014/10/t-sql-queries/bad-habits-count-the-hard-way.
There are different methods that are discussed:
COUNT(*).
Using System Views.
Putting table lock on the table whose counts are needed.
Change the isolation method.

Monday, October 13, 2014

Data Science/Data Scientist...

With the advent of Big Data technologies, the increased use of non structured data in business analytics, one of the roles that is beginning to rise in terms of demand is the Data Scientist. The area of study that a Data Scientist specializes is the area of Data Science. The huge amounts of data has called for more predictive analysis of the data instead of just focusing on the operational data. The power of predictive analytics if done right is that it gives the ability to a business to make better decision about future plans. For example how a company should target is products, what type of features does a product need to have to address the customer requirements. The emergence of data science has made Microsoft re prioritize the data mining features they incorporated into SSAS. All of the Data Mining algorithms are being re purposed into the Machine Learning tool in Microsoft Azure. I have explained about the Machine learning topic in my previous blog topic. In addition to Machine learning, to be make more effective usage of the tools available in Azure, it is important to understand the basics behind Machine Learning. Coursera offers a course on R Programming, it is free of cost and one can enroll by going to the https://www.coursera.org/ website. In case one is interested in going beyond learning the R language, Coursera has a complete Capstone course in Data science which includes learning about the algorithms behind data mining, getting introduced to machine learning apart from learning the R language. The Capstone course covers experimental design, statistical inference and regression models. This is one area which is growing very fast, so having a good foundation with courses listed above can provide a step into the predictive analytics world. One can also refer to the following link to understand the different types of algorithms in Data mining.
http://msdn.microsoft.com/en-us/library/ms175595.aspx

Data Science Capstone Course link:
https://www.coursera.org/specialization/jhudatascience/1?utm_medium=courseDescripTop


Monday, October 6, 2014

Hive, BI Best Practices, Machine Learning...

The SQL Saturday BI edition that was held in Charlotte on October 4, 2014 had some interesting topics which I attended. They were really insightful and provided me with some good information that I would like to share. The first session by was given by Kevin Goode on topic of Introduction to Hive, the session was really interesting with lots of questions and Answers. HIVE basically is a layer on top of hadoop that allows users to write SQL language type queries. HIVE in turn take the SQL queries and converts them to map reduce jobs, they get executed the hadoop layer. The results are then provided back to the user. HIVE uses a language called HQL which is a sqlvariant and has lots of similarities like MySQL and Oracle Syntax. The query construct of HQL has commands which very similar to Transact SQL. One of the main things that need to keep in my mind is that HQL/HIVE does not great optimizer like sql server, so while constructing the join statements in HQL one has to be very careful about placing the tables in the join. HIVE?HQL is best suited for anaylytical queries and also it has no updates and deletes. The rule of thumb for now seems to be that the smaller tables come first and then the larger tables. HQL/HIVE also provided some type of execution plan but these are not as sophisticated as the ones you see in SQL Server.  The error messages provided by HQL when there are issues are more java based, so it could a little bit of getting used while working with HQL error messages. While working with HIVE/HQL the developer should be aware of the type of data coming in and how it is partitioned. The organization of data really helps optimizing the HQL queries. The results provided by HQL can be used to feed a traditional datawrehouse system.One of the areas where Hadoop is weak is in the area of security. Security is not very structured in hadoop, the security has to be achieved using other bolt on tools. Hortenworks is one of the leading distributors of hadoop and they provide information regarding HQL. The frontend provided to connect to HIVE is completely web based and users can use this tool to write HIVE queries using HQL. Hortenworks is one of the big players in hadoop distributions: please visit the hortenworks website here.
The second session which attended was on tabular models. The topic was about how build a tabular model using visual studio 2012 with SSAS 2012 tabular instance installed. It was an informative topic as it bought out the differences between SSAS traditional MDX based and tabular solutions. One of the key differentiators is that SSAS is disk based and tabular is in memory and provide good speeds. The limitation of the tabular model would be size of memory that can be used to hold tabular models. Also with tabular you don't have writeback capabilities. The demo was pretty impressive, some of the steps done in the demo very similar to how one would go about building SSAS cubes.  The topic was presented by Bill Anton and his blog is located at:
The third session which attended was the most crowded of all the sessions. It was about Best Practices for delivering BI solutions. The audience were all BI professionals and also were with IT in various companies. SQL Server BI Expert James Serra provided a great session on best practices. He started off with why BI projects fail, a Gartner study has revealed that around 70% of BI projects fail. The fail factors ranges from lack of expertise/experience, lack of proper communication, requirements and project management. He highlighted the issues of clash between Business and IT over how the solution should be delivered. One of the interesting aspects that was mentioned was for technology to provide a clean data ware house and then build some abstraction layers on top of the warehouse, once that is done allow the users to build the users to utilize self-service BI solutions.Understand the difference between kimball and Inmon priciples, cappabilities of tabular vs Multidimensional, star schema vs relational. Please visit his website for some great information on this topic and other aspects related to Datawarehouse.
The final session I attended was about Introduction to Microsoft Azure Machine Learning, this topic was really exciting and informative. The topic is basically about predictive analytics, the tools provided by Microsoft as part of the Azure framework. Microsoft has a group called Microsoft research which initially working on developing algorithms for the Data mining product as part of SSAS. In the past few years there has not been much push on data mining within SSAS. Recently with Microsoft making a big push for cloud services offerings as part of the Azure framework, all the Microsoft research are now being made part of Machine learning using Microsoft Azure. He is a link that provided a head start on the Machine learning topic: http://azure.microsoft.com/en-us/documentation/services/machine-learning/. During the Session Bill Carroll provided an insightful demo of the machine learning framework within Microsoft Azure.  Azure provides a very nice framework for a developer to set up an experiment. The workspace for setting up experiments look very similar to SSIS workspace within visual studio. As part of the setting up experiment, one feeds data to an algorithm based on the type of outcome that is needed, then the model is trained, scored and evaluated, the model can be published. Once the model is published, it can be called through an application using a web service. There are code snippets available in C#, Python to call the web service to execute the model. Some of the examples of machine learning are when a user purchases a product in amazon, there are a list of recommended products that are generated, list of recommended movies in Netflix based on users watching preference, and these are some of real life examples, where Machine learning is used.

Monday, September 15, 2014

SQL Teams - Working Together...

With so many products coming out of the gate, each product trying to address one part of the BI/Datawarehouse problems, one of the most important aspects is that the teams/folks involved in BI projects work together well. Since BI/Datawarehouse implementation involve cross team interactions , it becomes very vital that the team works together for the bigger picture. One of the classic areas of debate is between the developer and the DBA. Please read the following link about the pain points between developer & DBA, and how these can be resolved. I found this article to be very informative and could provide some tips...the article has been written by David Poole, a Data Architect.
https://www.simple-talk.com/opinion/opinion-pieces/dbas-vs-developers-a-sad-tale-of-unnecessary-conflict/?utm_source=ssc&utm_medium=publink&utm_content=dbavsdev

Thursday, September 11, 2014

SQL Server 2014 - Top Features...

Here is a summarised list of new features in SQL Server 2014 that could be worth an upgrade. There are still a lot of customers on SQL Server 2008, they could take the path straight to 2014 instead of going to SQL Server 2012.
1. New In-Memory OLTP Engine
2. Enhanced Windows Server Integration
3. Enhancements to AlwaysOn Availability Groups
4. Enhancements to Backups
5. Updateable ColumnStore Indexes
6. SQL Server Data Tools for Business Intelligence
7. PowerBI for Office 365 Integration

The New-In Memory OLTP Engine and ColumnStore Indexes are bound to attract the OLTP and OLAP Customer base. Please read through these feature details in the following link. The link gives a more detailed description of the features.

http://sqlmag.com/sql-server-2014/sql-server-2014-important-new-features.