Site icon Martin's Blog

ETL for Data Warehouses – The SSIS vs. T-SQL debate

I see many questions about using the built-in components in SSIS versus writing T-SQL code to perform dimension/fact updates, especially type 2 dimension changes (changes that track history).

In the old DTS days (SQL Server 2000), the choice was pretty simple and straight-forward. DTS (Data Transformation Services) was a great tool for controlling workflow and automating the environment, but it was terrible at performing in-memory and/or set-based operations. Add to that the fact that we had no components specifically for loading data warehouse dimensions, and your mind was made up: Control the workflow with DTS, stage the data in staging tables and create stored procedures to load the dimensions. I suppose you could call it the old-school method.

With SSIS today (SQL Server 2005/2008) the choice is much harder. The tool has improved by leaps and bounds. We now have the ability to cache lookup data, perform much more sophisticated transformations, and above all we have a component dedicated to loading dimensions. These are all very appealing and make the choice so much harder…exponentially harder if you are new to Business Intelligence or SSIS.

So the question remains: Which one should I choose for loading dimensions/facts?

From my experience developing ETL processes for a few large data warehouses, I still believe that T-SQL is the way to go (yes, call me old-school)…and here are the reasons for my argument:

 

The above are some of the basic reasons I prefer to control the workflow with SSIS, but use stored procedure calls within the packages to perform all the work when loading dimensions & facts. You may argue that I am oversimplifying things, that you should use a combination of tasks (cached lookup transforms etc.) to get the most out of SSIS and the fact that certain operations could be done while the data is still in memory.

That is of course true, and proves the fact that this debate is too complicated and diverse to conclude within the confines of a single (non short-story) blog post. There are too many scenarios to cover, but the one thing I can say with certainty is that the T-SQL approach has never left me in a dead-end.

My final word of advice in terms of this debate is this: Be sensible. Know what SSIS does well and use that to your advantage. But don’t use SSIS components as an easy way out. The fact that a component is there doesn’t mean it should be used.

T-SQL is the fundamental language of SQL Server, and was designed to perform well with sets of data. Learn how to use it, and you will reap the benefits for years to come.

As a side note, I’ve just finished setting up my schedule for the PASS Summit next month. There will be a session on this topic, so please attend if you want to get some other points of view.

Exit mobile version