Thursday, August 13, 2015

Azure ML Studio - Part 2

In continuation of my earlier blog post related to Azure ML studio, I would like to describe some additional components that can be used while setting up an experiment. One of the main components available in the Experiment designer is called Statistical functions. This section has a set of multiple functions to choose from, they range from Elementary Statistics to Hypothesis testing. These components would typically used once the dataset has been cleansed to an extent so that one can accurate readings of the data from the experiment. Please see the image below. In this example after executing an R-Script the output is fed to an Descriptive Statistics module.

The Descriptive statistics module typically can include Counts, Range, Statistical Summaries and Percentiles. Once the descriptive statistics is completed, the output can be stored on a variety of formats and this is provided by the writer component. Please see the image below. The Data destination can be

Azure (SQL Database, blob storage, table), Hive Query. Each format has its own advantages and for more info, one can refer to the link here: https://msdn.microsoft.com/library/azure/7a391181-b6a7-4ad4-b82d-e419c0d6522c


Friday, August 7, 2015

Azure ML Studio - Part 1

Data Science has been experiencing tremendous growth in the business world today, there is a tremendous scope/job opportunities for people with Data Science Experience. One of the challenges has been to learn the different components of Data Science since most of them involve lot of Statistical, Math, Data Mining Algorithms knowledge. Microsoft on its part has been working steadily expose data science for the programming public. Initially Azure was slow to take off, but now with growing cloud implementations, Azure has been experiencing a lot of growth. So Microsoft decided to use the Azure platform and provide Data Science tools for the programmers. One of the very effective tools that is offered is called the Azure ML Studio, this is a development environment for Machine Learning Model Development. The interface of this tool is similar to some of the Visual Studio tools provided earlier by Microsoft. In order to start using a the Azure ML Studio one needs to have a Azure account. The whole concept of Azure ML works on the concept of Software as a service. One can use the following link to learn more about the Azure ML capabilities: https://studio.azureml.net/ Once you login to the azure studio, the first that will happen is that the workspace will be set up. There will be a + symbol at the bottom of the workspace, click on that to create your first experiment. You have a couple of choices here 1) You can create a blank experiment 2) You Can create a experiment based on the templates provided. The option 2 would help one to set up an experiment quickly and understand the various components of the experiment. When you choose from the samples, you can either open it in ML Studio and view it in the gallery. I feel tools like Azure ML studio provide a great first step in exploring the power of Machine Learning/Data Science.

 One of the components in the above image is the Enter Data component. This component is primarily used for defining column headings, these column headings can be used to assign to the data sets that are read through the Reader component. In this case in the Reader component, we are downloading a file from a website. Since in this example the headers of the file downloaded by Reader component were not user friendly, we use the Enter Data component to provide meaningful column_names. In this example we have used the column_names to be in the csv format. For example please see the image below for Enter Data component:
In the image above the column_name is the header in the csv file and the other below it are actual column names which would be used to assign it to the data set ready by reader component.

Monday, August 3, 2015

Tableau and R Integration...

Among the Data visualization tools, tableau is one of the leading tools which is used by lot of organizations in various capacities. The types of reporting range from operational to really sophisticated data visualizations combining various data sources. With R growing to be a language of choice for Data Science related activities such as Machine Learning and Data Mining, it is being integrated with a variety of Tools. Given such a scenario it was natural to expect the integration between tableau and R to happen. Please see the link below for a video on R and tableau integration.
http://www.tableau.com/new-features/r-integration
Quoting tableau" Tableau Server can also be configured to connect to an instance of Rserve through the tabadmin utility, allowing anyone to view a dashboard containing R functionality".
Also the link contains a whitepaper on integration between the two tools, please check out the same.
Lot of interesting things happening in the Data Science these days...

Wednesday, July 29, 2015

SQL Server 2016 and R-Integration...Part 2

In this blog post I would like to continue the discussion which done in the Part 1 of the SQL Server 2016 and R-Integration. In this blog post I would like to discuss one of the important R Libraries that can speed up the learning process a little and get to analyzing the data more quickly. The library I would like to discuss here is called Rattle. The library rattle can be installed in R by using the following commands. The commands below can be executed within R-Studio. The R-Studio is a more UI friendly to work on R-Commands and scripts.
install.packages("rattle")
rattle()

The install.packages() command in R will install all the files within the rattle library. In R-Studio one would set up the path to pull down the libraries as needed. As you can see the option for Use Internet Explorer Library/Proxy for HTTP is enabled.

Once the package is installed, you can execute the rattle() function with the R-Studio, this will launch a GUI for rattle which has lot of useful options for doing data analysis. In the Rattle GUI as you can see there are lot of options that can be used for source of data, in the example below I am choosing the R-Dataset called who for analysis. Make sure to hit the execute option once you have chosen the data source from the drop down list.

In order to explore the dataset , use the explore tab and choose the summary option and I additionally choose Basics. These 2 options combined provides an quick overview of the dataset who by providing some important statistical data about the data set.

There are a lot more options available in rattle to do much more complex data analysis. As the integration between SQL Server and R continues, I am hoping such utilities are provided by Microsoft so that the data analysis can be more enhanced

Monday, July 27, 2015

Cortana Analytics - Microsoft

In today's world of big data, cloud, mobile paradigm there have been a slew of analytic/dashboard tools out there, trying to get the attention of business and grab market share. One of the tools that I came across and looking forward to getting a deeper dive into the details is the Cortana Analytics suite from Microsoft. Microsoft is coming out with lot of offerings these days that look exciting and interesting. Here is the link that explains why Cortana Analytics from Microsoft.
http://www.microsoft.com/en-us/server-cloud/cortana-analytics-suite/why-cortana-analytics.aspx.
One of the key features that I tam wanting to explore further is the aspect of perceptual intelligence:
Quoting the web site "Interact with customers and stakeholders in new ways and infer intent with vision, face, speech, text and sentiment analysis to customize responses and drive appropriate actions". This would be really interesting to see how things play out here. There are a couple of events I plan to attend to see if there are any used cases related to the topic of perceptual intelligence. Once you go the link above, one can actually see industry specific used cases examples documented on the website. Will post another topic related to Cortana once I get through some presentations.

Thursday, July 23, 2015

Big Data, Column Store, InMemory...

With the increase in data that is being stored and analyzed , the concepts such as Big Data, Column Store and In Memory databases have been in discussion or being used by different organizations. Also there are multiple vendors at play, the one size fits all doesn't seem to be working. Please see this video where in Michael Stonebraker talks about the different concepts such as Column Store, Big Data and In Memory databases. He specifically mentions not to give in to the marketing hype and use proper proof of concepts on site with the data that needs to be analyzed to arrive at the right solution needed for an Organization. Here is the link for the video, it is a bit old but I felt is very relevant in terms of the questions to be asked.
https://www.youtube.com/watch?v=_00H1cgXeWw. Mike specifically talks about how important columnstore is going to be in Data warehouses.

Saturday, July 18, 2015

SQL Server 2016 and R-Integration...Part 1

Earlier I had blogged about SQL Server and R can be connected to perform data analysis. Also in SQL Server 2016 there is supposed to be better R Integration. In today's market Data scientists/Data Wranglers are in great demand since organizations have piles of data and want to make sense of the data, also if data can provide valuable insights. Given these market trends and the push for aligning the business and data more closely it is important to have the analytic skills. In this blog I would like to provide an article which gives a flavor of sql server 2016 and R-Integration. In the link below:
http://blog.revolutionanalytics.com/2015/05/r-in-sql-server.html. The article describes SQL Server program managers Lindsey Allen and Borko Novakovic demonstrated a prototype of running R within SQL Server. For the R-Integration,Microsoft SQL Server has collaborated with http://www.revolutionanalytics.com/products. I hope you find the article and demo provided in the link above useful.