Tuesday, June 3, 2014


Hadoop is definitely making its presence felt in big organizations, it is being used as a complement to the existing traditional BI investments made by the companies. One of the projects which i am following, there is a very measured approach to getting hadoop into the ecosystem and making sure there is good value proposition for the company. Understanding the concepts involved with Hadoop and the infrastructure that goes with it will provide a good understanding of the concepts. There has been lot of talk about folks need to know java to work with hadoop. In certain cases finding such resources can cost a lot of money in the sense good java resources are expensive. Anyways I don't want to get into that debate. One of the key functions that is involved working with hadoop/Big data is ETL. We still need to be able to move the data from the HDFS/map reduce outputs to say other traditional data warehouses, basically perform data integration. One of the tools that is available in the market today is from a vendor called Syncsort (http://www.syncsort.com). They have a tool called DMX Express which is very useful for performing ETL/Data integration functions. As per the vendors website, DMX Express "Accelerates Data Integration performance, eliminate manual coding, and reduce total cost of ownership to capitalize on the opportunities of Big Data". There is another tool called DMX-h, this tool as per the vendor:"From comprehensive connectivity and optimized sort capabilities, to point-and-click interfaces for developing MapReduce jobs, DMX-h provides everything organizations need to successfully deploy ETL in Hadoop. No coding, no scripting, no tuning; just smarter data integration". I had the opportunity to see the demo of DMX Express the user interface of the tool looks similar to the tasks we have in SSIS. Folks who have worked with SSIS should find the transition a lot easier. Also as per the person who was doing the demo there is no prerequisite of knowing Java, the tool takes care of what has been done by writing Java classes.
DMX Express link: http://www.syncsort.com/en/Products/DMX/DMX-Overview.

No comments:

Post a Comment