Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

James Serra's Blog

James is currently a Senior Business Intelligence Architect/Developer and has over 20 years of IT experience. James started his career as a software developer, then became a DBA 12 years ago, and for the last five years he has been working extensively with Business Intelligence using the SQL Server BI stack (SSAS, SSRS, and SSIS). James has been at times a permanent employee, consultant, contractor, and owner of his own business. All these experiences along with continuous learning has helped James to develop many successful data warehouse and BI projects. James has earned the MCITP Business Developer 2008, MCITP Database Administrator 2008, and MCITP Database Developer 2008, and has a Bachelor of Science degree in Computer Engineering. His blog is at .

When to use T-SQL or SSIS for ETL

When doing ETL, you have the choice of using T-SQL or SSIS.  What things should you consider when deciding which one to use?  Here are some of the major design considerations to think about:

  • Performance - With T-SQL, everything is processed within the SQL engine.  With SSIS, you are bringing all the data over to the SSIS memory space and doing the manipulation there.  So if speed is an issue, usually T-SQL is the way to go, especially when dealing with a lot of records.  Something like a JOIN statement in T-SQL will go much faster than using lookup tasks in SSIS.  Another example is a MERGE statement in T-SQL has much better performance than a SCD task in SSIS for large tasks
  • Features/capabilities – Some features can only be done in either T-SQL or SSIS.  You can shred text in SSIS, but can’t in T-SQL.  For example, text files with an inconsistent number of fields per row can only be done in SSIS.  So certain tasks may force you into using one or the other
  • Current skill set – Are the people in your IT department more familiar with SSIS or T-SQL?
  • Ease of development/maintenance – Of course, whatever one you are most familiar with will be the easiest, but if your skills at both are fairly even, then SSIS is usually easier to use because it is graphical, but sometimes you can develop quicker in T-SQL.  For example, having to join a bunch of tables will require a bunch of tasks in SSIS, where in T-SQL it is one statement.  So it might be easier to create the tasks to join the tables in SSIS, but it will take longer to build then writing a T-SQL statement
  • Complexity – SSIS can be more complex because you might need to create many tasks to accomplish your objective, where in T-SQL it might just be one statement, like in the example above for joining tables
  • Extensibility – SSIS has better extensibility because you can create a script task that uses C# that can do just about anything, especially for non-database related tasks.  T-SQL is limited because it is only for database tasks.  SSIS also has logging, which T-SQL does not
  • Likelihood of depracation/breaking changes – Minor issue, but T-SQL is always removing features in each release that will have to be rewritten
  • Types/architecture of sources and destinations – SSIS is better if you have multiple types of sources.  For example, it works really well with Oracle, XML, flat-files, etc.  SSIS was designed from the beginning to work well with other sources, where T-SQL is designed for SQL Server and it requires more steps to access other sources, and there are additional limitations when doing so
  • Local regulations – Are there some company standards you have to adhere to that would limit which tool you can use?

If you decide T-SQL is the way to go and you just want to execute a bunch of T-SQL statements, it’s still a good idea to wrap them in SSIS Execute SQL Tasks because you can use logging, auditing and error handling that SSIS provides that T-SQL does not.

Note that most people use a hybrid approach, where you use SSIS, but for certain situations use an Execute SQL Task in SSIS to execute T-SQL instead of using a data flow.  And that Execute SQL Task can also call a stored procedure.  Some like this because it allows you to make a change in the stored procedure, avoiding having to make a change in SSIS and re-deploying the package.  The negative side of this is that instead of having everything contained within SSIS, you instead have to jump back and forth between SSIS and SSMS when you are building or debugging a project.  It’s a balance you will have to weigh as you go along.

More Info:

Video ETL Head-To-Head: T-SQL vs. SSIS by Tim Mitchell

Comments

Posted by Anonymous on 30 August 2011

Pingback from  Dew Drop – August 30, 2011 | Alvin Ashcraft's Morning Dew

Posted by Jamie Thomson on 30 August 2011

James,

Good article. I wrote a lengthy reply but accidentally hit CTRL-W and lost the whole damn lot :)

I can't be bothered writing it all out again but I will bring up what I think is one vital point; that is, you haven't explicitly mentioned the one true differentiator of SSIS - its ability to combine data from different data sources and operate upon it *in a single operation*.

Everything else is by-and-large subjective.

Extensibility? T-SQL has got SQLCLR.

Complexity? That simply depends on one's opinion.

Ease of dev - Again, a matter of opinion.

The one absolute differentiator of SSIS dataflows is their heterogeneity.

JT

P.S. Can you give an example of the following:

"text files with an inconsistent number of fields per row can only be done in SSIS"

I don't see what about this scenario makes it easier to deal with this in SSIS (except for the fact that it has the Flat File source - but I don't think that's what you're getting at)

Posted by Jason Brimhall on 30 August 2011

Nice write-up.  A lot does depend on the things you described.  As Jamie also said - again much of it depends.

I do like the ability to more easily combine multiple data sources in ssis in a single data set.

Posted by Lee Everest on 3 September 2011

I am working with mainframe files (VSAM, not DB2 relational data) that have 2000, 3000, 5000 columns, many of which are 'occurs' that in no way shape fashion or form SSIS can handle; consuming these and accounting for changes to their accompanying copybooks is best done in TSQL. Maybe I'll share sometime, either on my blog or <deep breath and exhale> here on SQLServerCentral.

Lee

Posted by Bruce W Cassidy on 3 September 2011

Interesting summary.  I think some of those points could be argued either way, but they are still valid considerations.

I've ended up wrapping T-SQL inside SQL Server Agent tasks, which have logging/auditing, concurrency and error handling (along with the error handling capability of T-SQL itself.)  The logging isn't as fine-grained as what is available within SSIS, but I haven't found that an issue in practice.

I've also found code generators based on meta-data easier to write within T-SQL than SSIS.  That's a pattern I try to utilise as much as possible with ETL engines as it reduces the amount of code to write and maintain along with making the engine more flexible.

The thing I find amusing is the performance point.  When SSIS became available, it was intended to be a high performance ETL solution.  But now it seems that T-SQL is the way to go.  Hmm.  I'll have to try and do some high volume "real world" tests and get a handle on the performance differences myself.  

Posted by Andrew Pascal on 5 September 2011

Like Bruce, I find that code generation from metadata makes repetitive tasks much easier to stamach in T-SQL than in SSIS. I, too, am working on some performance tests to compare the run-time speed of SSIS versus T-SQL.

One key consideration that you may have overlooked is reusability. A T-SQL stored procedure can be re-used; an SSIS task doesn't lend itself to re-use. We have been burned several times by copying and pasting in SSIS packages and then having to go and fix the same bug in dozens of packages!

I would also suggest that unit testing is simpler with T-SQL. A stored proc can be tested in isolation; so far, I haven't found a way to unit test anything smaller than a package in SSIS. I find it makes the testing very compicated. Any clever suggestions welcomed!

Posted by Jim Nichols on 5 September 2011

A good post with great follow up comments.

I would add one more advantage of SSIS is that it is better for managing concurrent load processes where for example you have 5 smaller tables loading concurrently with one large table and so on. It is cumbersome to implement concurrent process and multi-threaded functionality using T-SQL, even with the aid of batch files, powershell and SQLCMD.

I agree with the comments on metadata generated code being a necessity, especially in cases with many tables across several databases being involved. In the majority of cases since SSIS came to be, I have found using T-SQL is the most effective way to quickly get the ETL system operating and afterwards it is much easier to accommodate changes than using pure SSIS. The main exception for selecting SSIS would be where concurrent heterogeneous sourced data flows were paramount to the ETL system.

Since SQL Server 2008 introduced minimally logged inserts which essentially allows mimicking a Data Flow Task (DFT) set to fast load using T-SQL, I have found there are fewer cases where it makes sense to use SSIS.  

There have been situations when I was able to use the best of both worlds where SSIS has execute SQL tasks that point to SQL files that contain the code to be executed. That allows for auto-generated SQL to be used and alteration of the T-SQL to occur without opening the SSIS packages.

Posted by Kristian Wedberg on 10 September 2011

Completely agree with the thrust of the article. On performance I'll add that it very much depends. As usual.-) For example:

* Out of box SSIS SCD component is pretty awful, including performance wise. Roll your own SCD load using other out of box (synchronous) components and it will typically be significantly faster than SQL.

* More generally, SSIS data flow in memory operations (ideally synchronous, but also asynchronous ones that fit in memory) are usually significantly faster than a SQL implementation that is doing disk IO.

* On the flip side, when the SSIS data flow starts to swap to disk, it typically degrades very quickly and becomes much (sometimes terminally) slower than SQL.

So choose your weapons carefully :-)

Posted by darshan.di on 4 May 2012

Nice post with a lot of useful comments.

I am of the opinion that if performance is not an issue or if you have good hardware then SSIS can be a good option.

In my package in one of my project i used a look up transformation 3 times on a same table (table had 2.5 lakh records) and that gradually ate up my virtual memory of 3 GB during runtime.I had to allocate extra virtual memory from another drive...

Leave a Comment

Please register or log in to leave a comment.