Tuesday, August 26, 2014

SPLUNK...Operational Data

In the Data warehousing  and BI world, there are numerous instances where we get text files/excel files as data inputs to a datawarehouse/data mart. As part of the handshake between the Source systems and traget systems we receive log files. These log files would contain summary information about the data being sent. Using the log file the target system would load validate the data being received, once data is validated the loading process would start. There are numerous techniques of doing this type of validation. In certain organizations there could be numerous processes that produce log files to other downstream systems that would need to consume the data. In addition to the log files there are also numerous amounts of machine data produced by different systems and applications that would be need to be monitored so that this type of information can be leveraged by organizations for various business groups. In order to mange this type of monitoring there is a tool provided by SPLUNK, web site: www.splunk.com. The company provides different types of monitoring tools: tools for operational intelligence, Cloud, BigData and Business Analytics. Please review the links below: for additional details.
Tools like SPLUNK provide business an opportunity to make sense of the operational/machine data and leverage positive results.

Thursday, August 7, 2014

SSIS Parameters-SQL Server

Given the nature of today's ETL operations there are lot of complex SSIS packages being built to handle different aspects of Data Processing. One of the key concepts involved with multiple SSIS packages is the passing of parameters between the packages. My friend Pinal Dave, a SQL Server Expert had bought out a excellent post in his famous SQL Server blog. Here is the link to the article.
Enjoy reading the article very informative...

Friday, August 1, 2014

Data Integration

One of the areas that i have increasingly working on these days is the area of data integration. Business/Companies have a variety of database systems and it is very rare that you find companies running out of one database system/platform. There is an increasing need for different database systems/architectures to co-exist. The battle at times becomes very political to replace one database/BI platform with another one in its entirety. This is where data integration comes in and there are lot of tools available in the ,market today in this space. The Gartner report for Data integration tools is available now. Based on the report Microsoft is in the Challengers quadrant, the data integration tools that encompass the Data integration framework as per the study are SSIS and BizTalk server. I have used a lot of SSIS but very rarely i have come across Biztalk servers is my projects. The reason why Microsoft is in in the challengers space is because of the lack of breadth of functionality, quoting the report: " Although Microsoft addresses core data integration requirements, non-bulk/batch data integration styles remain a gap in its offering relative to market demand. Microsoft's product strategy for aligning capabilities to data movement, transformation and orchestration (referred to as "information production") aims to broaden its market positioning". The leaders in the Data Integration Space are 1. Informatica 2. IBM 3. Oracle. Oracle's main data integration tools are Oracle Data Integrator (ODI), Oracle Data Service Integrator, Oracle GoldenGate and Oracle Warehouse Builder (OWB).  One of the weakness in the Oracle integration suite is lack of skill set available with the set of tools mentioned above. Companies like Cisco(Composite Studio) and Actian fall in the visionaries category. Cisco is a new incumbent in the data information management technologies market. Adeptia and Syncsort fall in the niche players category. For the Complete report, please use the link below:


The domain of data integration encompasses a wide variety of services which are increasing by the year.

Wednesday, July 16, 2014

Nice to Have Books...

With growing online content, downloadble books available for business intelligence, data warehousing, it is always go to have books handy on one's book shelf. When one has to read through several pages sometimes a book becomes very handy. Here is a blog written by SQL Server expert James Serra on what would be the nice to have books, it covers the areas of Data warehouse, Business Intelligence, Master Data Management and Consulting. Please refer to the link below: the list is pretty exhaustive.
Some of the books which I have are Ralph Kimball book on data warehousing and Microsoft Reporting Services 2012.

Monday, June 30, 2014

Renaming SYSSSISLOG table...

In SSIS we have different ways of logging the progress of the SSIS package, these can be tracked in files or can be done with a table. One of the popular ways of auditing the SSIS package is using a SQL Server table. The table that is used is dbo.sysssislog table, this table is created by default when choosing the Logging of SSIS package to be done via SQL Server. There was a comment/question by one of the readers of my blog, as to whether the table can be renamed into something more meaningful. The answer is yes, here are the steps listed: The one thing to watch out for is to modify the sp_ssis_addlogentry stored procedure to point to the new renamed audit table.

1. Click the table dbo.sysssislog in the database that has the table.
2. Right click on the table and rename the table to what the new name should be.
3. In the SQL Server Management studio expand programmability and choose Stored procedures.
4. Right click on the sp_ssis_addlogentry stroed procedure and choose modify.
5. Replace the table dbo.sysssislog in the stored procedure with the new renamed table.
6. Press F5 to recompile the stored procedure.
7. This audit table is present in the SQL Server 2008 and SQL Server 2012 database systems.
8. In SQL Server 2005 databases, the table is called sysdtslog90.
9. The stored procedure is called sp_dts_addlogentry, it would be in the stored procedures folder in SSMS.

Tuesday, June 3, 2014


Hadoop is definitely making its presence felt in big organizations, it is being used as a complement to the existing traditional BI investments made by the companies. One of the projects which i am following, there is a very measured approach to getting hadoop into the ecosystem and making sure there is good value proposition for the company. Understanding the concepts involved with Hadoop and the infrastructure that goes with it will provide a good understanding of the concepts. There has been lot of talk about folks need to know java to work with hadoop. In certain cases finding such resources can cost a lot of money in the sense good java resources are expensive. Anyways I don't want to get into that debate. One of the key functions that is involved working with hadoop/Big data is ETL. We still need to be able to move the data from the HDFS/map reduce outputs to say other traditional data warehouses, basically perform data integration. One of the tools that is available in the market today is from a vendor called Syncsort (http://www.syncsort.com). They have a tool called DMX Express which is very useful for performing ETL/Data integration functions. As per the vendors website, DMX Express "Accelerates Data Integration performance, eliminate manual coding, and reduce total cost of ownership to capitalize on the opportunities of Big Data". There is another tool called DMX-h, this tool as per the vendor:"From comprehensive connectivity and optimized sort capabilities, to point-and-click interfaces for developing MapReduce jobs, DMX-h provides everything organizations need to successfully deploy ETL in Hadoop. No coding, no scripting, no tuning; just smarter data integration". I had the opportunity to see the demo of DMX Express the user interface of the tool looks similar to the tasks we have in SSIS. Folks who have worked with SSIS should find the transition a lot easier. Also as per the person who was doing the demo there is no prerequisite of knowing Java, the tool takes care of what has been done by writing Java classes.
DMX Express link: http://www.syncsort.com/en/Products/DMX/DMX-Overview.

Tuesday, May 13, 2014

Data Analytics-SQL Server...

One of the more popular term that is doing rounds these days is the term called Data Scientists. This seems to have emerged alongside the growth of Big Data. As part of the Data scientists role it is expected to have a good knowledge of statistical tools and data modelling concepts using mathematics plus exposure tools like SAS,SPSS and R. One of the tools i came across recently which is being used in the Big Data world is R. Today I had the opportunity to attend a session organized by PASS on SQL Server and R, this was presented by Tomaz Kastrun. The session focused on how  R can be used with SQL Server to perform Data Analysis. As per Tomaz:
"The Language R  is a powerful language for  data analysis  with all the great features  for data import from SQL Server Environment".  The Language R along with SQL Server data would be a powerful tool for data scientists to use wide range of statistics. There are statistical functions in SQL Server as part of T-SQL but they tend to be more of the standard/simplistic statistical functions. In case we have to use more complex statistical functions then the T-SQL Code would get longer and more complex which would lead to maintainability issues. This is where tools like R Language come in which can be used with SQL Server to produce very effective results. Here is a definition/purpose of R-Language (which was presented as par of the topic).

In order for the SQL Server Integration with R, one would need the following:
SQL Server
R-Language (needs to be installed) - This is available at the site: http://www.rstudio.com
Data (of Course), it would be good to cleanse the data before using for analysis with R.

The topic presented shows different opportunities that exists to leverage current tools like SQL Server and combine them with a powerful statistical language like R. This provides different analytical capabilities for the Data Scientists/Developers.