Showing posts with label BI. Show all posts
Showing posts with label BI. Show all posts

Monday, October 6, 2014

Hive, BI Best Practices, Machine Learning...

The SQL Saturday BI edition that was held in Charlotte on October 4, 2014 had some interesting topics which I attended. They were really insightful and provided me with some good information that I would like to share. The first session by was given by Kevin Goode on topic of Introduction to Hive, the session was really interesting with lots of questions and Answers. HIVE basically is a layer on top of hadoop that allows users to write SQL language type queries. HIVE in turn take the SQL queries and converts them to map reduce jobs, they get executed the hadoop layer. The results are then provided back to the user. HIVE uses a language called HQL which is a sqlvariant and has lots of similarities like MySQL and Oracle Syntax. The query construct of HQL has commands which very similar to Transact SQL. One of the main things that need to keep in my mind is that HQL/HIVE does not great optimizer like sql server, so while constructing the join statements in HQL one has to be very careful about placing the tables in the join. HIVE?HQL is best suited for anaylytical queries and also it has no updates and deletes. The rule of thumb for now seems to be that the smaller tables come first and then the larger tables. HQL/HIVE also provided some type of execution plan but these are not as sophisticated as the ones you see in SQL Server.  The error messages provided by HQL when there are issues are more java based, so it could a little bit of getting used while working with HQL error messages. While working with HIVE/HQL the developer should be aware of the type of data coming in and how it is partitioned. The organization of data really helps optimizing the HQL queries. The results provided by HQL can be used to feed a traditional datawrehouse system.One of the areas where Hadoop is weak is in the area of security. Security is not very structured in hadoop, the security has to be achieved using other bolt on tools. Hortenworks is one of the leading distributors of hadoop and they provide information regarding HQL. The frontend provided to connect to HIVE is completely web based and users can use this tool to write HIVE queries using HQL. Hortenworks is one of the big players in hadoop distributions: please visit the hortenworks website here.
The second session which attended was on tabular models. The topic was about how build a tabular model using visual studio 2012 with SSAS 2012 tabular instance installed. It was an informative topic as it bought out the differences between SSAS traditional MDX based and tabular solutions. One of the key differentiators is that SSAS is disk based and tabular is in memory and provide good speeds. The limitation of the tabular model would be size of memory that can be used to hold tabular models. Also with tabular you don't have writeback capabilities. The demo was pretty impressive, some of the steps done in the demo very similar to how one would go about building SSAS cubes.  The topic was presented by Bill Anton and his blog is located at:
The third session which attended was the most crowded of all the sessions. It was about Best Practices for delivering BI solutions. The audience were all BI professionals and also were with IT in various companies. SQL Server BI Expert James Serra provided a great session on best practices. He started off with why BI projects fail, a Gartner study has revealed that around 70% of BI projects fail. The fail factors ranges from lack of expertise/experience, lack of proper communication, requirements and project management. He highlighted the issues of clash between Business and IT over how the solution should be delivered. One of the interesting aspects that was mentioned was for technology to provide a clean data ware house and then build some abstraction layers on top of the warehouse, once that is done allow the users to build the users to utilize self-service BI solutions.Understand the difference between kimball and Inmon priciples, cappabilities of tabular vs Multidimensional, star schema vs relational. Please visit his website for some great information on this topic and other aspects related to Datawarehouse.
The final session I attended was about Introduction to Microsoft Azure Machine Learning, this topic was really exciting and informative. The topic is basically about predictive analytics, the tools provided by Microsoft as part of the Azure framework. Microsoft has a group called Microsoft research which initially working on developing algorithms for the Data mining product as part of SSAS. In the past few years there has not been much push on data mining within SSAS. Recently with Microsoft making a big push for cloud services offerings as part of the Azure framework, all the Microsoft research are now being made part of Machine learning using Microsoft Azure. He is a link that provided a head start on the Machine learning topic: http://azure.microsoft.com/en-us/documentation/services/machine-learning/. During the Session Bill Carroll provided an insightful demo of the machine learning framework within Microsoft Azure.  Azure provides a very nice framework for a developer to set up an experiment. The workspace for setting up experiments look very similar to SSIS workspace within visual studio. As part of the setting up experiment, one feeds data to an algorithm based on the type of outcome that is needed, then the model is trained, scored and evaluated, the model can be published. Once the model is published, it can be called through an application using a web service. There are code snippets available in C#, Python to call the web service to execute the model. Some of the examples of machine learning are when a user purchases a product in amazon, there are a list of recommended products that are generated, list of recommended movies in Netflix based on users watching preference, and these are some of real life examples, where Machine learning is used.

Monday, September 15, 2014

SQL Teams - Working Together...

With so many products coming out of the gate, each product trying to address one part of the BI/Datawarehouse problems, one of the most important aspects is that the teams/folks involved in BI projects work together well. Since BI/Datawarehouse implementation involve cross team interactions , it becomes very vital that the team works together for the bigger picture. One of the classic areas of debate is between the developer and the DBA. Please read the following link about the pain points between developer & DBA, and how these can be resolved. I found this article to be very informative and could provide some tips...the article has been written by David Poole, a Data Architect.
https://www.simple-talk.com/opinion/opinion-pieces/dbas-vs-developers-a-sad-tale-of-unnecessary-conflict/?utm_source=ssc&utm_medium=publink&utm_content=dbavsdev

Monday, September 8, 2014

SQL Saturday 2014...

SQL Saturday sessions are in full swing, these are one day events with lots of different topics to listen to and learn from. The SQL Server Saturday BI Edition is going to held in Charlotte on October 4, 2014. The Charlotte BI Group is excited to bring this event to you. The list of topics include: ETL, Data Visualization, Self-Service BI, Data Quality, Master Data Management, OLAP, Big Data, SharePoint, Data Management, as well as plenty for the SQL Server Administrators out there! .
Location: Central Piedmont Community College (CPCC), Levine Campus, 2800 Campus Ridge Road, Matthews, NC 28105 (located in South Charlotte) .

SQL Saturday Link for Charlotte : http://www.sqlsaturday.com/330/eventhome.aspx.

Friday, August 1, 2014

Data Integration

One of the areas that i have increasingly working on these days is the area of data integration. Business/Companies have a variety of database systems and it is very rare that you find companies running out of one database system/platform. There is an increasing need for different database systems/architectures to co-exist. The battle at times becomes very political to replace one database/BI platform with another one in its entirety. This is where data integration comes in and there are lot of tools available in the ,market today in this space. The Gartner report for Data integration tools is available now. Based on the report Microsoft is in the Challengers quadrant, the data integration tools that encompass the Data integration framework as per the study are SSIS and BizTalk server. I have used a lot of SSIS but very rarely i have come across Biztalk servers is my projects. The reason why Microsoft is in in the challengers space is because of the lack of breadth of functionality, quoting the report: " Although Microsoft addresses core data integration requirements, non-bulk/batch data integration styles remain a gap in its offering relative to market demand. Microsoft's product strategy for aligning capabilities to data movement, transformation and orchestration (referred to as "information production") aims to broaden its market positioning". The leaders in the Data Integration Space are 1. Informatica 2. IBM 3. Oracle. Oracle's main data integration tools are Oracle Data Integrator (ODI), Oracle Data Service Integrator, Oracle GoldenGate and Oracle Warehouse Builder (OWB).  One of the weakness in the Oracle integration suite is lack of skill set available with the set of tools mentioned above. Companies like Cisco(Composite Studio) and Actian fall in the visionaries category. Cisco is a new incumbent in the data information management technologies market. Adeptia and Syncsort fall in the niche players category. For the Complete report, please use the link below:

http://www.gartner.com/technology/reprints.do?id=1-1Y495V6&ct=140724&st=sb&mkt_tok=3RkMMJWWfF9wsRonv6TNe%252B%252FhmjTEU5z16e4qWqa3lMI%252F0ER3fOvrPUfGjI4DT8pgNK%252BTFAwTG5toziV8R7HNJc160s8QXBjm

The domain of data integration encompasses a wide variety of services which are increasing by the year.

Wednesday, April 23, 2014

Data Virtualization...

With companies/business having different types of  data marts/datawarehouses there is a fear that these could become independent silos and could not provide value to the business as a whole. There has been lot of resources(Time,Money,People) being invested in building a datawarehouse/data marts. The business would like to get value from the disparate datawarehouses/data marts, so the concept that has been in play for quite a while is Data Virtualization. Data Virtualization could be defined as process of building a agile data layer for easy data access and delivery. This is also sometimes referred to as the data abstraction layer so that users can get access to data quickly. Having defined Data Virtualization, where does one use this concept: Here are some of the situations where Data Virtualization could be used.
1. Data Federation - Scenario: A application requires data from multiple incompatible data sources.
(Example: Federated Views, Virtual Data Marts)
2. Data Warehouse Extension: The Datawarehouse does not contain the required data to create reports (Example: Datewarehouse Extension)
3. Data Virtualization - Build a Agile Data layer for easy data access.
4. Big Data Integration: How to combine Big Data with traditional data for analysis (Example: Hadoop)
5. Cloud Data Integration: Need to integrate systems on site with the Applications running in the cloud (Example: SaaS Application Integration)

Based on the scenario and type of question we are trying to answer, Data Virtualization could be a solution. The concepts described here were based on a article related to the Composite , a data virtualization/Integration tool from Cisco (http://www.compositesw.com/data-virtualization/).

Monday, April 7, 2014

AS-IS - TO BE...Business Process Management

When I work/worked on BI projects, one of main reason for initiating a project is to make a process better for the business and provide increased value to business in terms of revenue/profits. A good analysis and design of the business requirements would yield very good results. One of the methodologies i have been exposed to as part of Business Process Management is the AS IS TO BE analysis method. As part of this method the first step that is to be done is to understand the current process which is called as the AS IS State. Here one builds a flow diagram of the present system/process in place. Once this completed, the analysis of the current process takes place. In this step any possible flaws/deficiencies are identified, also any possible opportunities to optimise the process is also discussed/documented. The goal here is to identify steps that improve the overall process and provide Business value. Once this step is completed, then we move on to the TO BE stage where the proposed/improved process is documented. In this step the improvements/optimisations are captured. A walk through of the proposed process is performed and made sure that the design can be implemented. There are different tools to perform the AS IS-TO BE methodology. One of the tools available is from visual paradigms, here is the link http://www.visual-paradigm.com/. The specific product that helps with the AS-IS - TO-BE Process is Logizian: Here is the description of the tool from the web site at a high level. One of the reasons to write a blog post on this topic was to highlight the importance of analysis and design in a project where BI solutions are going to be implemented.
http://www.visual-paradigm.com/product/lz/

Logizian - BPMN 2.0 Business Workflow Design Software

Easy-to-use and cross-platforms business process design tool that supports business process modeling with Business Process Modeling Notation (BPMN) 2.0, Data Flow Diagram (DFD), decision table and organization chart. Logizian supports also advanced business modeling features like to document working procedures for workflow elements, animating business process diagram, process simulation, reports generation and publishing workflow design to website.
 
 

Friday, March 28, 2014

Gartner 2014 Report - BI and Analytics Platforms...

The Gartner report for 2014 related to the Business Intelligence and analytics platform is out, as expected the number of vendors competing in the BI and Analytics space continues to grow. There are niche vendors continuing to enter the market space, at the same there are some vendors who have made it to the leaders quadrant. The vendors were evaluated across 17 categories which were divided into 3 main areas:

  • Information Delivery
  • Analysis
  • Integration
 The vendors who made it to the leaders quadrant are:
Tableau,Qilk,Microsoft,SAP,SAS,IBM,Oracle,Tibco Software,MicroStrategy and Information Builders. Since i use Microsoft suite of products the most, areas of  concerns that has been listed are, Interactive visualizations, metadata management and Mobile BI support. As per the report Microsoft is trying to handle the Mobile BI space by using Power BI. Currently Microsoft BI products seemed to be most used in the Enterprise and Small business world the most. One of the challenges that Microsoft faces I feel is cross product integration, we have the traditional Microsoft BI stack on one hand, the Office/Power BI suite on the other, to add to the mix we also have Sharepoint driven BI capabilities. How do these different versions of BI products from Microsoft Co-Exist? At the same time Microsoft is depending on cloud based BI offerings to reduce complexity for smaller companies.
There were other vendors who have good mention with respect to the leader's quadrant: They are:
Tableau
TIBCO Software
Qilk
The concerns for the above three vendors were mainly in the areas of Customer Support and sales. For Qilk as per the report there were mixed opinions about whether product is enterprise ready or not.
In the report there was mention of two cloud based BI vendors:

Birst
Birst's BI platform is primarily a cloud-based offering. It includes a broad range of components, such as data integration, federation and modeling, a data warehouse with a semantic layer, reporting, dashboards, mobile BI and a recently announced interactive visualization tool.

GoodData
GoodData is a cloud BI and analytics specialist. It provides a range of front-end BI capabilities and packaged analytic applications that complement its comprehensive cloud and on-premises source data integration and cloud-based data warehouse platform.

Wednesday, April 29, 2009

Business Intelligence...

Business Intelligence is an area of specialisation which has seen growth in the past several years and continues to grow. So what is Business Intelligence, Simply put it basically is aimed at providing information for what a Business whats to know and when the business wants to know it. Business Intelligence also encompasses the field of Data Mining which is aimed at unearthing hidden patterns in the data which has gathered by a particular entity. This provides the business with the capability of predicitng certain trends in the particular business domain. For example Data Mining techniques are/can be used for Direct Marketing Campaigns,Fraud Detection Techniques to name a few. There are lot of tools and techniques which facilitate Business Intelligence and Data Mining, for examples Vendors like Microsoft,Oracle,IBM provide Business Intelligence Tools. There are techniques like Neural Networks,Decision Trees,Machine Learning which are part of the Data Mining Domain.
There is an other important aspect of Business Intelligence which are Reports,DashBoards,Scorecards. These provide a Business a visual outlook of Key Performance Indicators,Trends and Projections which in turn a gives a good birds eye view of overall Busniess Performance. To summarise, Business Intelligence when utilised in a proper manner will provide very good benefits and can be a Value Add for a Business/Organisation.