Ram's Blog: 2016

Sunday, December 25, 2016

2017 - Analytics Trends...

As we wind down year 2016 and move into 2017, there is lot of anticipation as to what will be the analytics like in 2017. 2016 saw a lot of development in the area of analytics with increasing range of applications in different industries/business. Big data in 2016 made significant inroads in 2016 and hope to see it solidify in 2017 with more number of business adopting such technologies. There was also lot of constructive debate around Big data analytics in 2016. There are more companies introducing machine learning technologies, one of the most prominent being Echo, Alexa from Amazon. There are other companies which are introducing machine learning capabilities in their applications.
Now i am providing a link which takes a lot what are the expected trends in 2017. One of the expected trend in 2017 especially for Developers in the BI/Database arena:
Traditional programmers will be required to gain data science skills in order to stay relevant, employable, and effective in their careers.
http://www.ibmbigdatahub.com/blog/big-data-and-analytics-trends-2017-james-kobielus-s-predictions.

Wishing every one a very happy and prosperous new year 2017. Hope to write more articles in year 2017.

Thank you

Monday, October 24, 2016

Data Wrangling...Trifacta

Today companies/business deal with large volumes of data and are constantly trying to find out value with their business data. Lot of combination of technologies are used to mine data and use them for predictive/perspective analytics. One of the key steps involved in producing a valid/useful analytical model is data preparation/cleansing. Data sometimes comes from multiple source, it needs to be properly merges so that meaningful analysis can be made. One of the products available in the market today for Data Wrangling is Trifacta, here is the link for the product: https://www.trifacta.com/products/. As per trifacta: "Successful analysis relies upon accurate, well-structured data that has been formatted for the specific needs of the task at hand. Data wrangling is the process you must undergo to transition raw data source inputs into prepared data set outputs to be utilized in your analysis". For detailed description on the Trifacta wrangler product: please https://www.trifacta.com/products/wrangler/.

Monday, September 19, 2016

SQL Saturday - Analytics/Machine Learning/R...

I had the opportunity to attend the SQL Saturday Event in Charlotte, NC on September 17, 2016. The event was very organized and hosted by a hard working/talented team of volunteers. There were a variety of topics spread across different aspects of SQL Server, Business Intelligence and Data Analytics. There were 3 sessions which I found very informative and Interesting. The 3 Topics were across Data Analytics, Machine Learning and Master Data Management (Microsoft).
The first topic which i attended was from SQL Server/Data Analytics Expert Jen Underwood (Please visit her excellent blog http://www.jenunderwood.com/ for more information/trends in Analytics), the topic was on trends in the Analytics Industry. The topic covered the skill sets/domains currently hot/growing in the Data Analytics/Big Data space. There were interesting demos on how certain jobs can be automated,also how robots are beginning to permeate different as aspects of human life and how they are helping out in areas of customer service.here is a link to robot video which has human like expressions and interactions:
Robotics. The interesting aspect of this demo is that rather being just machine like, the robot interacts in a very human like fashion. These type of robots could replace jobs that can be easily automated. There were other aspects covered about the cloud, Machine Learning and Predictive Analytics. One of the other interesting area that was mentioned was the area of immersive data visualization. This is where the concept of 3-D visualizations can be used to analyze and understand data. One of the visualizations that was shown was the stock market rise and fall during the past several years, also it showed the crash of the stock market in 2008 in a roller coaster ride simulation. Here is the link for the demo: http://graphics.wsj.com/3d-nasdaq/.This is a virtual reality guided tour of 21 years of the Nasdaq, very interesting concept. One of the thoughts that went through my mind was that how much of such visualizations would work in certain types of business/organizations. On the whole the topic was very informative with respect to what is coming in the Data Analytics space and how one needs to be prepared.
The second session was on Master Data Management by Rushabh Mehta (SQL Server Expert, Past SQL Pass President, Independent Consultant/CEO). This topic was a presentation on the very important but often ignored topic of data management. In this presentation Rishabh went through why Master Data management is important and discussed one of the projects he did for his client. In this project he explained the process of data Cleansing, how records could de-duplicated and usage of standardized services from Melissa Data Services. Melissa Data Services provides services around Address, Name Standardization, these are very useful when one tries to created a master record for a customer. Here is the link for Melissa and the services they offer: http://www.melissadata.com/. The session also provided insights into how a master record could be created for companies, here services offered by Dun and Bradstreet were used. Overall session was very informative and conveyed the importance of Master Data Management.
The third session which i found very useful was the Session on Machine Learning with R by Paco Gonzalez,Corporate Center of Excellence, Program Manager at SolidQ. The session was very informative and very nicely presented. Paco Touched upon how tweets from twitter can be imported and analyzed to determine the sentiment of a particular topic/product being discussed. He took an example of product that was being sold in a online cloth retailer website and how tweets regarding this particular product can be scanned to understand whether folks are talking good or bad about the product. He mentioned that one would get the feeds from twitter and also request Twitter for data relating to particular has tags. Paco also presented case studies on how Machine learning can be used to determine if a particular would be with a bank or leave the bank. He demonstrated how past patterns can be used to train a model and use test data to determine the accuracy of the bank model. The R integration with SQL Server 2016 seems to be very interesting and exciting, now one has the power of getting predictive results by executing stored procedures.
There was demo of the Microsoft Cognitive Services that can be used for analysis of text, face and emotions:
here is the link: https://www.microsoft.com/cognitive-services.
Overall a very exciting SQL Saturday and a very good information gathering session.

Friday, September 16, 2016

Data Virtualization-Data Conditioning/Masking

These days business are expected to agile and have to deliver solutions quickly and efficiently. This means that while developing products it has to be moved through the different environments efficiently and quickly. There is also a lot of dependency on data in the test and lower level environments. The quality of data in the test environments need to good so that the applications using the data can be tested effectively. Often organizations run into challenges while populating data in Lower level environments either due to space issues and just the time taken to condition data takes a very long time, this thereby affects product delivery. This where products that specialize data virtualization come in. Delphix is one such product which enables organizations to effectively get production type data in test environments. Here is the website for the product:https://www.delphix.com/. According to the website: "Speed wins. It’s a simple fact. The faster you can deliver new applications, features and upgrades to market, the better your business performs. For that you need faster data. And for faster data, you need Delphix.". Please refer to the link below which explains the need for having such a product. https://www.delphix.com/why-delphix. As the nature of application development keeps changing, the quality of data needed for testing and other pre-production activities becomes very important and essential.

Tuesday, August 2, 2016

Data Science - Education

With the rapid growth of Big data technologies , there has been an exponential growth of data science and its related technologies. This has has also led to the demand for data scientists and also the jobs related to data science are very lucrative. Microsoft has been steadily expanding it s cloud based offerings and also getting into big Data related technologies and efforts. Since there is tremendous need for Data science skills, Microsoft has come forward to offer a curriculum totally devoted to Data Science. This curriculum is offered via edx.org. There are a total of 9 courses and the price per course ranges from $49-$99. There is also a final project for which around 6-10 hours is required. One can check the link below for all the details:
https://www.edx.org/microsoft-data-science-curriculum
The courses cover from Use Microsoft Excel to explore data to Iimplement a machine learning solution for a given data problem. Each of the course can be done a auditing course or one can upgrade to get a validity certificate on passing the course. Each of the courses have Labs ,Quizzes and discussion forums, the discussion forums can be use to get questions answered related to the concepts being discussed. I hope the courses provide the much needed insights into Data Science.

Sunday, June 19, 2016

Hadoop, Big Data...

I was provided an opportunity to write an blog article at analytics vidhya about big data related technologies. Analytics vidhya is very well maintained website about big data technologies. All the articles are well researched and presented. Please see the Link below for my article.

http://www.analyticsvidhya.com/blog/2016/06/started-big-data-integration-hdfs-dmexpress/

Hope the readers find the above article interesting and useful...

Thursday, April 21, 2016

SQL BI Conference 2016 - Pass Business Analytics 2016

With the advent of newer technologies and tools ins area of Data Analytics, Data Science, Big Data and Data Visualizations, the Pass Business Analytics 2016 conference to held in San Jose from May 3-4 provides lot of interesting sessions and topics of discussion. Please do check out the web site below for further details:
http://passbaconference.com/2016/Home.aspx
It is a tremendous opportunity to grow your network and skill base, with lot of talented speakers/experts presenting on very interesting topics. There has been a lot of momentum in the Microsoft BI space especially with PowerBI and its related set of tools. There are very useful sessions with Power BI and Power Pivot provided by folks from the company:
http://www.powerpivotpro.com/. They provide a lot of training in Power BI and PowerPivot, also provide consulting on projects that would need PowerBI and PowerPivot.
When you get a chance, check them out...

Wednesday, April 13, 2016

Data Services

With increasing amounts of data and information, one of the challenges business face is how to access the data and how the data needs to be stored. One of the concepts that can be used to effective and efficient provisioning of data is the use of Data Services. One of the requirements to having a good data service is to make sure the underlying data is clean and properly modeled. There is a lot of effort that needs to going to making sure the data is properly cleansed and correctly modeled in the data store/data marts. Once this is accomplished, using data services provides advantages to the consuming application. The concept of data service brings in the notion that the Data quality is done in a central place which includes cleansing and enriching data. The advantages of Data services include 1)Agility - This allows customer to access data quickly and also doesn't place the burden of having knowledge of the underlying data. 2)Cost Effectiveness - Data analyst/providers can build the foundation of quality/modeled data and the presentation layer can be determined by the consumers.
Different applications/consumers could use the standard web services and access the underlying data.
3) Data Quality - Data Access is controlled data services, which in turn allows data quality to improve at the central location. 4)Consistency - Using data services could drive consistency in accessing data from the data store and eliminate different data provisioning approaches.
There are vendors who provide data service tools, they can be categorized into
1. Volume Based Approach
2. Data Type Based Approach.
Hope this blog post provides an insight into Data Services.

Monday, April 4, 2016

SQL Server 2016 - Query Store

Performance tuning is a constant topic of discussion, also one of the top priority items when it comes to good code development, better performing queries provide quicker response times, users/customers are satisfied. There have been constant improvements in the area of performance tuning features in SQL Server, also in the later releases of SQL Server, the execution plans were greatly improved. It is very important for developers to understand execution plans so that they improve queries that cause bottlenecks in the application. One of challenges have been to effectively store execution plans and queries so that they could be reviewed for bottlenecks and improve them. In SQL Server 2016 the concept of query store has been introduced. This Query store feature helps one to gain insight into the query plans and statistics.
In order to enable query store feature in SQL Server 2016, one can use Transact-SQL:

ALTER DATABASE AdventureWorks2012 SET QUERY_STORE = ON;

The query store feature in SQL Server 2016 can also be enabled through SQL Server Management Studio, by choosing the database properties/query store page.
Quoting from Microsoft MSDN Page: I am listing a couple of scenarios for using query store feature:

Quickly find and fix a plan performance regression by forcing the previous query plan. Fix queries that have recently regressed in performance due to execution plan changes.
Determine the number of times a query was executed in a given time window, assisting a DBA in troubleshooting performance resource problems.

The MSDN link for Query Store is listed below: The page below provides a more completion explanation of query store.

https://msdn.microsoft.com/en-us/library/dn817826.aspx

Friday, April 1, 2016

Data Gravity...

With the advent of Big data, cloud technologies , data science there has been lot of interests in moving towards cloud based solutions for Business Intelligence and other applications. More companies/business are looking towards cloud deployments or showing interest in cloud based solutions. While all this has been going on, there also has been a tremendous growth in volume of data and information being shared and requested. The data volume has also bought into perspective the notion of whether data needs to be present on-premise or should data be moved to the cloud. While researching the ideas being suggested around data volumes, i came across the concept of data gravity. Jen Underwood , Data Analytics and BI expert, has written an excellent article covering the concept of data gravity. Please see the link below for the full article.
http://www.jenunderwood.com/2016/02/27/hybridbi/
She covers the concept of data gravity, the type of solutions available and also some of the vendors who offer solutions. The concept of on-premise vs cloud based BI/Data storage provides interesting insights and would be very useful for folks who are planning their data storage and provisioning strategy. The concept of stretch databases are also covered in the article above. An interesting read on data gravity and types of solutions being offered.

Wednesday, March 30, 2016

Data Modeling...

In today's world of ever growing data and information, one of the areas that has been kind of battling the existence has been data modeling, there are wide ranging opinions about the validity of data modeling both positive and negative. One of the viewpoints favoring data modeling has been to provide a context around the data that needs to be accessed and used, how it can be stored and resented to users. There are number of situations where how the data being organized has to be presented to the business in a concise fashion. There are different types of models within data modeling like conceptual models, logical models and physical models. What do these type of mean and how they differ from each other is very concisely presented in the article below. Quoting from the article: "My uses of conceptual, logical, and physical come from the Information Engineering (IE) methods of data modeling". The article has been written Karen Lopez: Senior Project Manager and Principal Consultant, InfoAdvisors, Inc.
http://www.datamodel.com/index.php/articles/what-are-conceptual-logical-and-physical-data-models/

I hope the above article provides good explanation especially for folks who are getting into the area of data modeling.

Tuesday, March 22, 2016

Data Science Resource/Blog...

One of the rapidly advancing areas today is Data Analytics/Data Science in the Business Intelligence space. There are lot of resources/tools that are coming explaining the capabilities of Data Science. Data Science has seen tremendous growth alongside the adoption of Big Data technologies. It is a very challenging space to keep up with, one of the blogs that i regularly visit to understand the concepts in Data Science and keep up with the trends in Data Science/Analytics, is http://www.analyticsvidhya.com/. The blog/website is very nicely laid out with content rich articles and tutorials. They cover a variety of tools and also encourage folks interested in data science to participate in challenges/contests. Please do visit the site, it provides amazing content and coverage.

Saturday, March 19, 2016

Experiments - Azure ML Studio

Azure ML studio is very intuitive and a powerful tool that can be used for performing different type of data analysis experiments. The Studio can be used to set up very basic experiments like descriptive statistics to performing complicated machine learning algorithms. The whole layout of the ML studio is very user friendly and in some ways remind me of the layouts in SSIS packages. I decided to setup a basic experiment in ML studio to perform certain descriptive statistics calculations on a sample data set. The dataset I used was adult census data from UCI repository and ran the data set through a R script component to data, once the data was massaged it was run through a descriptive statistics to perform descriptive statistics on each column. Once all the components are in place the experiment can be run in the ML studio to get the results. The data can be verified and if everything looks good, the experiment can be published as a web service and can be accessed by a say .NET program.
I have illustrated the experiment which i have set up in the ML Studio.

Hope this provides a basic overview of ML Studio. Azure ML studio opens up a lot of possibilities to perform different types of data science experiments and bring them more into the main stream.

Sunday, February 14, 2016

Data Lineage/Data Modeling...

With the advent of visualization and sophisticated reporting tools there is one aspect which is very important for analysis of data. For the reports/visualization to be valid and effective, it is very important to have the data behind the charts /graphs to be properly modeled and validated. When analyst look at the reports, they very soon look at what comprises the data element, where the data is sourced from. The analysts would also be interested in how the data has been transformed by the time it is shown in the reports. In order to have the reports/visualization to be effective it is very important to have a strategy for data lineage/data modeling. Typically data is sourced from various sources and finally they end up in data warehouse/data mart. During this process it very important to track how the data moves from one system to another. It is possible that data elements would move through the process with out any transformation. The other possibilities are that the data element could be transformed like there could be casting/trimming, extraction of a data element into multiple columns in the target data mart. The data elements could become part of a calculation or they could end up being candidates for lookup tables. In order to capture all of the above data movements/transformations, it would be good to have a repository where the Metadata/data lineage can be stored. Such a metadata tool would be very handy for analysts/developers to really understand what makes up the data elements. It can also provide a valid context so that a effective visualization solution can be developed. The other aspect that help provide improved visualizations is also data modeling. Data modeling when done correctly with a proper framework can provide immense benefits to the overall data services visualization strategy. It is extremely important to know the data that is being worked on/extracted. There are products in the market that cover the data lineage piece of the puzzle. Pyramid Analytics, one of the more popular vendors that provide really good analytic tools has come with a product called BI office that has a Governed Data Discovery Platform. Please see the link below for all the details:
http://www.pyramidanalytics.com/pages/bi-office/governed-data-discovery.aspx.
As per the website here is a overview of the Data governance feature in BI Office:
"BI Office provides you with self-service content creation in a centralized, shared paradigm – that also tracks the content life-cycle".
When companies/Business build a data services/data provisioning strategy, data governance needs to be an integral part of the whole vision. As the usage begins to scale and there is more demand for services, data governance becomes more and more critical to make the data services strategy successful.

Friday, January 15, 2016

SSRS 2016-New Features...

With a plethora of reporting, data visualization tools ,SSRS was getting buried in the mix. The question was would SSRS 2016 get any major updates or would that kind of go away slowly with other tools taking prominence. The good news is that SSRS 2016 is getting lot of new features, here are some of them: Reporting Services Web Portal: The usual SSRS portal has undergone a major revamp and the portal looks more in tune with the current dash board offerings from other vendors. One of the major change has been the use of HTML 5 in SSRS 2016 which should allow reports to be portable across devices and increase the offering s more effectively. There is also a feature being added called Mobile report Publisher where in you can publish SQL Server mobile reports to your reporting portal. For more detailed explanation of SSRS 2016 features, please refer to the link here below:
https://msdn.microsoft.com/en-us/library/ms170438.aspx

Ram's Blog