Ram's Blog: January 2013

Tuesday, January 22, 2013

PowerPivot-DAX

Self service business analytics and in-memory analytics are a couple of streams that are going to see coverage this year based on various publications and technical magazines. I had written some basic blog posts about PowerPivot recently. One of the big weapons in the SSAS arsenal was the MDX language. Yes the MDX language has a quite a bit of a steep learning curve. The ones who acheived a certain level of mastery of MDX are/were able to extend the SSAS cubes built to provide various analytic information. In the same fashion in order to provide extended capabilities in PowerPivot, there is a new language available in the Powerpivot Domain Called DAX. The purpose of DAX is very similar to that of MDX in SSAS, the construct provides a lot of functions in order to calculate different type of aggregations,relations and averages. DAX provides the user with simple functions and that goes all the way to more complex functions.
When a user is in the PowerPivot window, for example where there are different worksheets with the Dimensions and the fact table, DAX functions can be used to get derived data. Let us say there is a worksheet called DimProduct which has all the columns in the DimProduct table. One can add a calculated column where in the user can type in a function called RELATED, this is a function available as part of the DAX construct. In this case one would type =RELATED(DimTable[ColumnName]), this function returns how the present Dimension table say DimProduct is related to the other dimension table listed in the RELATED function. As one can see here that it is possible to draw various relations between the tables, this could be very helpful in creating computed columns which can be later used in the PowerPivot Analysis that can be done using PIVOT tables.

Wednesday, January 16, 2013

T-SQL (INSERT)-SQL Server 2008

T-SQL has constantly evolved right from its very beginning and there are lot of new features that have been added to T-SQL over the years. I was working on a project where in I was creating a stored procedure and was writing INSERT sql. There was a need to capture the value inserted into a table and assign that value to a variable. In the INSERT t-sql statement there is a option called OUTPUT, what this clause does is it has access to the inserted table. This is very similar to how inserted table is available in Triggers. The purpose of the inserted table is hold the value that been just Inserted. For example a INSERT Statement with an output clause could look like the following:
INSERT INTO dbo.TestTable
(BatchId,
ServiceId)
OUTPUT inserted.BatchId,inserted.ServiceId
SELECT BatchId,
ServiceId FROM dbo.LookupTable WHERE ProcessName = @ProcessName.

In the above example the value of BatchId and ServiceId is being routed to inserted table with the columns BatchId and ServiceId. Now just having these in inserted table would not of much value, these need to be captured. In order to capture the values once can extend the INSERT statement in the following way:

INSERT INTO dbo.TestTable
(BatchId,
ServiceId)
OUTPUT inserted.BatchId,inserted.ServiceId INTO @out(BatchId,ServiceId)
SELECT BatchId,
ServiceId FROM dbo.LookupTable WHERE ProcessName = @ProcessName.

In the above example the values in the inserted table are being routed to a table variable with the 2 columns.
Once the statement is executed one can get the values from the @out table variable.

SELECT BatchId,ServiceId FROM @out.

The above feature gives a elegant way to capture information from a INSERT statement. The above set of queries were executed on SQL Server 2008.

Wednesday, January 9, 2013

In Memory Analytics - Part 2

In my earlier blog post i had written about In Memory Analytics and the tools provided by Microsoft which uses in memory analytics. This segment is becoming increasingly popular with new offerings from different companies. I guess with the cost of memory and disk space storage becoming cheaper there could be a plenty of in memory product offerings. One Such Product offering which is outside of Microsoft is from Salient , http://www.salient.com/our-technology/. To quote from the website:
"Salient has pioneered a new business intelligence paradigm that merges the unique thought process of each individual with advanced data visualization. This enables everyone – non-technical as well as technical – to interrogate the data in order to discern patterns, trends and outliers. Users can understand the relationships among activities, costs and outcomes – and discover root cause in seconds."
One of their offerings is Salient ETL which allows one to move from Design to Automation in 3 easy steps. It sounds very interesting.
From Design to Automation in 3 Easy Steps:

•Design based on business objectives
•Map source data to the logical design
•Automate updates of the data feeds
In Summary in the year 2013 there going to be lot of offerings in the In memory analytics and Big Data Domain.

Monday, January 7, 2013

SSIS - Audit Transformation...

In ETL operations/Design it is pretty obivious that there is a lot of need for auditing the data that is bought into Data Mart/Data Warehouse. In certain cases depending on when the data is fed into a Data mart lot of business transformations happen. While working on a SSIS package for one of my projects, i came across a task available in the Data flow of SSIS. The task is called Audit Data flow component. In order to use this component, one needs to attach this component to the out put of a say a input data source. What is essentially being done is, in addition to the input columns that is being derived from the input source, one can add additional ausit columns. For example: In my project I had a Flat file input source, so the first step was to drag and drop the flat file source into the Data flow and setup the path and the input columns. Once this was done, to the output of the flat file source I connected the Audit Data Flow component like the image show below:

Now in order to add the audit columns to the input data source, right click on the audit data flow comonent, the audit transformation editor opens up. In the editor I added the audit columns I needed to be part of the destination data, please see image below:

This Data flow component provides nice addition to capabilities of SSIS , in case one wants to add audit information the data to be stored in Data Mart/Warehouse.

Wednesday, January 2, 2013

SSAS - Cube Processing...

In SSAS there are different types of options available for Cube Processing. Each one of the Processing types have different uses. It would be good to understand the different types of processing for cubes.

Process Types Available in SQL 2005 / SQL 2008 / SQL 2008R2 Analysis Services:

ProcessFull : applies to all objects which means it discards the storage contents of the object and rebuilds them. It applies to descendants objects as well.

ProcessDefault
ProcessData
ProcessIndexes
ProcessUpdate
ProcessAdd: Applies to Dimensions and Partitions, it didn't existing in SSAS 2000. It essentially optimizes ProcessUpdate in scenerios where you are adding only new records to Dimension and facts.

ProcessClear: Discards the storage of the object, it applies to all objects and descendants objects
The ProcessUpdate option is a tricky option: It applies only to Dimension. It is the equivalent of incremental dimension processing in Analysis Services 2000. It sends SQL queries to read the entire dimension table and applies the changes—member updates, additions, deletions.

Since ProcessUpdate reads the entire dimension table, "How is it different from ProcessFull?" The difference is that ProcessUpdate does not discard the dimension storage contents. It applies the changes in a "smart" manner that preserves the fact data in dependent partitions. ProcessFull, on the other hand, does an implicit ProcessClear on all dependent partitions. ProcessUpdate is inherently slower than ProcessFull since it is doing additional work to apply the changes.
Depending on the nature of the changes in the dimension table, ProcessUpdate can affect dependent partitions. If only new members were added, then the partitions are not affected. But if members were deleted or if member relationships changed (e.g., a Customer moved from Redmond to Seattle), then some of the aggregation data and bitmap indexes on the partitions are dropped. The cube is still available for queries, albeit with lower performance.(with ProcessUpdate Flexible aggregations and indexes on related partitions will be dropped)

Thanks to Karan Gulati (From Microsoft SSAS/BI Group)

Ram's Blog