With the advent of public clod like AWS, Google Cloud, Azure and the adoption of these public cloud services by various businesses, companies and organizations, one of the main talking points is how data can be stored in the cloud, security concerns, architecture. These are all the topics that are of main interest when storing data in the cloud. In certain organizations the move to cloud has been very quick, in certain sectors the adoption has been pretty slow primarily due to security concerns. Now these challenges are being overcome steadily. In terms of data services, one of the cloud platforms that is very popular for the last few years and also getting ready to go for IPO is SnowFlake. The link for the company is www.snowflake.com. Snowflake is a global platform for all your data services, data lakes and data science applications. Snowflake is not a relational database but supports SQL basic operations, DDL,DML, UDF,Stored Procedures. Snowflake uses Amazon S3 and now Azure as the public cloud platform for providing the data services over the cloud. Snowflkes architecture in terms of the database is that it uses columnar storage to enable faster processing of queries. Data is loaded into Amazon S3 through files into user areas and then is moved into the snowflake schema/ databases for enablement of queries. Please refer to the snowflake company website for additional information on architecture, blogs and other kits that are available for one to check out all the features. Snowflake takes advantage of the Amazon S3 storage power and uses its own columnar and other data warehouses related features for computational purposes. One can also refer to youtube for additional details on snowflake architecture. Here is a link: https://www.youtube.com/watch?v=dxrEHqMFUWI&t=14s that cane be used for snowflake architecture.
Monday, September 14, 2020
Thursday, September 10, 2020
AI, Machine Learning, Data Governance
Artificial Intelligence, machine Learning hav continued to penetrate all walks of life and technology has undergone tremendous amount of changes. It is being said that Data is the new oil which actually has propelled AI and ML to greater heights. In order to use AI and ML more effectively in the business today, it is imperative that all the stakeholders, consumers and technologists understand the importance of data. There should be very good collaboration between all the parties involved to make good use of data and take it forward to use AI and ML effectively. For data to be used effectively in an organization, we need proper guardrails to source the data, clean the data, remove unwanted data, store and provision data to various users. Here is where data governance comes in, there has to be a enterprise wide appreciation for having such process and standard. It should come off as process heavy or bureaucratic but something that is efficient and at the same able to manage data effectively. As organizations grow, there is going to be a vertical and horizontal implementation of data governance and both of them need to be in sync. This in turn is very essential for AI and ML efforts because it will make the outcomes more meaningful to the organization. In addition better contexts would be defined which will make the AI and ML projects more viable and reduce inefficiencies and provide cost benefits.
One of the important step in achieving the above mentioned steps is to have very data cataloguing measures , persist all the logical, business entities, lineage of all the data being sourced to be all in place. The data also need to be classified as NPI or non NPI depending on the business context. In today's world majority of the work mentioned above is manual and a lot of time is spent in trying to get SME inputs and approval. This causes time delays and project cost increase, this can be alleviated by using data discovery tools that are available today. The are quite a few tools available but the one i have started to look more into the capabilities is the tool from Atlan: https://atlan.com/. atlan provides an excellent platform for performing Data Discovery, Lineage, Profiling, Governance and exploration. In what i have seen with the tool and the demo provided to me, the whole data life cycle has been very nicely captured.The user interface is very intuitive and the tool also helps the user navigate through the different screens without any technical inputs needed. The search is very google like in terms of looking up the different data assets that are available. I will be doing some more use cases and deep dive into the tool in the next couple of weeks and will provide more updates.
Monday, April 17, 2017
Blueprint provides a very good mechanism to manage requirements and track requirements in a continuous fashion. The tool provides a very easy to use interface where one can maintain folder structures for a product. The product folder is broken down into the following categories:
1. There is a folder to maintain current state of the product. This would contain the various components of a product, there can be broken down into different sub folders. Within each sub-folder you can maintain what blueprint calls artifacts. Artifacts could be a word doc, visio diagram, user inter face mock ups, Used cases and Test scripts/
2. There is a folder called Product management which is used for current requirements being worked on for the product.
3. There is a folder called Enterprise where is artifacts related to standards, compliance, regulations can be maintained.
4. There is a folder called archive, where in artifacts can be archived ans stored.
Blueprint requirements software link:
Blueprint also provides integration with other applications. The most important with respect to agile is the integration with a tool called Rally. The one artifact inn Blueprint can be used for integration with Rally is called Process Story Teller. When Process story teller artifact is added in Blueprint, it provides an interface into a tool called Blueprint Storyteller. In Blueprint Storyteller, one can create process flow with decision points. Once the process flow is completed, these can be published into Blueprint and later be used for Integration with Rally.
Please see image below for how the storyteller interface looks...
Rally Software Link:
Friday, February 24, 2017
Here is the link for one of their products:
Quoting Alteryx: Alteryx Designer solves this by delivering a repeatable workflow for self-service data analytics that leads to deeper insights in hours, not the weeks typical of traditional approaches.
Please refer to the link to gain more insights into the Product Capabilities.
Sunday, December 25, 2016
Now i am providing a link which takes a lot what are the expected trends in 2017. One of the expected trend in 2017 especially for Developers in the BI/Database arena:
Traditional programmers will be required to gain data science skills in order to stay relevant, employable, and effective in their careers.
Wishing every one a very happy and prosperous new year 2017. Hope to write more articles in year 2017.
Monday, October 24, 2016
Monday, September 19, 2016
The first topic which i attended was from SQL Server/Data Analytics Expert Jen Underwood (Please visit her excellent blog http://www.jenunderwood.com/ for more information/trends in Analytics), the topic was on trends in the Analytics Industry. The topic covered the skill sets/domains currently hot/growing in the Data Analytics/Big Data space. There were interesting demos on how certain jobs can be automated,also how robots are beginning to permeate different as aspects of human life and how they are helping out in areas of customer service.here is a link to robot video which has human like expressions and interactions:
Robotics. The interesting aspect of this demo is that rather being just machine like, the robot interacts in a very human like fashion. These type of robots could replace jobs that can be easily automated. There were other aspects covered about the cloud, Machine Learning and Predictive Analytics. One of the other interesting area that was mentioned was the area of immersive data visualization. This is where the concept of 3-D visualizations can be used to analyze and understand data. One of the visualizations that was shown was the stock market rise and fall during the past several years, also it showed the crash of the stock market in 2008 in a roller coaster ride simulation. Here is the link for the demo: http://graphics.wsj.com/3d-nasdaq/.This is a virtual reality guided tour of 21 years of the Nasdaq, very interesting concept. One of the thoughts that went through my mind was that how much of such visualizations would work in certain types of business/organizations. On the whole the topic was very informative with respect to what is coming in the Data Analytics space and how one needs to be prepared.
The second session was on Master Data Management by Rushabh Mehta (SQL Server Expert, Past SQL Pass President, Independent Consultant/CEO). This topic was a presentation on the very important but often ignored topic of data management. In this presentation Rishabh went through why Master Data management is important and discussed one of the projects he did for his client. In this project he explained the process of data Cleansing, how records could de-duplicated and usage of standardized services from Melissa Data Services. Melissa Data Services provides services around Address, Name Standardization, these are very useful when one tries to created a master record for a customer. Here is the link for Melissa and the services they offer: http://www.melissadata.com/. The session also provided insights into how a master record could be created for companies, here services offered by Dun and Bradstreet were used. Overall session was very informative and conveyed the importance of Master Data Management.
The third session which i found very useful was the Session on Machine Learning with R by Paco Gonzalez,Corporate Center of Excellence, Program Manager at SolidQ. The session was very informative and very nicely presented. Paco Touched upon how tweets from twitter can be imported and analyzed to determine the sentiment of a particular topic/product being discussed. He took an example of product that was being sold in a online cloth retailer website and how tweets regarding this particular product can be scanned to understand whether folks are talking good or bad about the product. He mentioned that one would get the feeds from twitter and also request Twitter for data relating to particular has tags. Paco also presented case studies on how Machine learning can be used to determine if a particular would be with a bank or leave the bank. He demonstrated how past patterns can be used to train a model and use test data to determine the accuracy of the bank model. The R integration with SQL Server 2016 seems to be very interesting and exciting, now one has the power of getting predictive results by executing stored procedures.
There was demo of the Microsoft Cognitive Services that can be used for analysis of text, face and emotions:
here is the link: https://www.microsoft.com/cognitive-services.
Overall a very exciting SQL Saturday and a very good information gathering session.