SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Better Know A SSIS Transform – The Sort Transform

This is part 2 of my 29 part series called Better Know A SSIS Transform.  Hopefully you will find the series informative.  I will tell you a little about each transform and follow it up with a demo basic you can do on your own.  Screenshots that go with examples can be viewed from my regular blog site.  http://blogs.pragmaticworks.com/devin_knight/

When developing SSIS packages you are bound find out that some transforms require your data to be sorted before it can accept new input rows (Ex. Merge & Merge Join).  The Sort Transform may be one of the more frequently used transforms for that reason, but it really should be one of the most avoided. 

Why Avoid the Sort Transform?

Hopefully after you read this section you will agree with me that the new title of this blog should be  Better Know a SSIS Transform Better Not Know a SSIS Transform.

The Sort Transform is a fully blocking asynchronous transform, meaning no rows can pass this transform until all rows have been sorted.  If you take a minute to think about it that makes sense.  You cannot send rows further down the Data Flow until all the rows have been sorted, so the Sort Transform holds up the show. 

The best way to visualize this is to think of a deck of cards.  You have a Conditional Split that wants to split the face cards from the numbered cards.  Before that can happen you have a Sort Transform that will sort all the cards by numbers and face value.  None of the cards can reach the Conditional Split until all the cards have been sorted causing our package to stall at the Sort Transform until all the rows have been processed.

So we know now we want to avoid it when possible.  The obvious way to avoid it is by using TSQL in your source, but what if we have a flat file source where you can’t write a TSQL ORDER BY.  In this case we’re stuck using the Sort Transform.

How to Avoid

  • When you are using a source that allows you to write TSQL then do it!
    • SELECT        ProductID, TransactionDate, Quantity, ActualCost
      FROM            Production.TransactionHistory
      ORDER BY ProductID

  • Now you need to tell SSIS that your data is sorted.  Right-click on the source and click Show Advanced Editor
  • Click the Input and Output Properties tab and select OLE DB Source Output
  • Change the IsSorted property to True  
  • Expand the OLE DB Source Output then expand the Output Columns.  Select the column(s) that are identified in the TSQL ORDER BY (in this case ProductID).
  • Change the SortKeyPosition property to 1.  The SortKeyPosition identifies the order in which columns are sorted.  This must be set for each individual column in the ORDER BY.  If we had multiple columns we were sorting then the the numbers would continue 1 for first column, 2 for second and 3 for third.  If the columns are ordered in descending order then the SortKeyPosition numbers would continue –1, –2 and –3.

Strategies for Using

When you have no other option but to use the Sort Transform (Ex. a Flat File Source) then be sure to use it wisely.  Think about the order of your Data Flow.  If you already have an Aggregate Transform in the package that’s performing a Group By then make sure to use the Sort Transform is after the Aggregate Transform.  This way instead of sorting 100,000 rows and then grouping them into 440 rows you can group 100,000 rows into 440 rows and only have to sort 440 rows.  **The Aggregate Transform is also a asynchronous transform that should be avoided when possible**

Configuring the Sort Transform

These steps will walk you through a basic configuration of the Sort Transform.

Example Overview

  • Add a Flat File Source pointing to any available flat file you have
  • Add Data Viewer to see data Presorted
  • Add a Sort Transform
  • Add Data Viewer to see data after being sorted
  • Use a Union All as a temporary destination

Step 1: Configure Flat File Source

  • Use any flat file you already have as a source.  Create the connection manager to it and use a Flat File Source in the Data Flow

Step 2: Add Data Viewer

  • Bring a Sort Transform in the Data Flow and connect the Flat File Source to it.
  • Right-click on the Data Flow Path between the Source and the Pivot Transform to open the Data Flow Path Editor.
  • Select Data Viewers then click Add and OK to add a Grid Data Viewer.  Click OK once more to return to the Data Flow

Step 3: Configure Sort Transfor

  • Open the Sort Transform Editor
  • In the Available Input Columns table use the far left column and check the column(s) you need to sort by.
  • Any columns you check will be added to the bottom where you can specify how they will be sorted.
    • Input Column:  This is the name of the input column that will be sorted
    • Output Alias:  This is the name the outputted column.  So if you want the column to have a new name you can change it here.
    • Sort Type:  Specify either ascending or descending for how the column will be sorted
    • Sort Order:  Works like the SortKeyPosition does in the Advanced Editor.  Specifies the order if multiple columns are sorted
    • Comparison Flag:  Allows special string functions like sort punctuation as symbols
  • Notice another nice feature in the bottom left of the editor is a check box that will remove rows with duplicate sort values.
  • Once the column(s) needed are checked and configured click OK

Step 4: Add Data Viewer and Destination

  • Drag over a Union All (if you are just testing) or an actual destination and connect the Pivot Transform to it.
  • Right-click on the Data Flow Path between the Source and the Pivot Transform to open the Data Flow Path Editor.
  • Select Data Viewers then click Add and OK to add a Grid Data Viewer.  Click OK once more to return to the Data Flow
  • Run the package to see the data pre and post sorted from the Data Viewers

Devin Knight

Devin is a BI consultant at Pragmatic Works Consulting. Previously, he has tech edited the book Professional Microsoft SQL Server 2008 Integration Services and was an author in the book Knight's 24-Hour Trainer: Microsoft SQL Server 2008 Integration Services. Devin has spoken at past conferences like PASS and at several SQL Saturday events. He is a contributing member to the Business Intelligence Special Interest Group (SIG) for PASS as a leader in the SSIS Focus Group. Making his home in Jacksonville, FL, Devin is a participating member of the local users’ group (JSSUG).


Posted by Anonymous on 25 June 2009

Pingback from  Dew Drop – June 25, 2009 | Alvin Ashcraft's Morning Dew

Posted by matthew_flannery on 29 June 2009

Reading part two I feel that I am missing something.  You say that asynchronous transforms should be avoided, but I do not see your rationale for why.  Because you have to wait for all the rows to arrive at the transform seems a little trite.  

Sorting and aggreagtion is going to cost you something regardless of where or how you do it.  You want to push the sorting and aggregation to the database but is it always cheaper to force the database engine to perform the task?  

If you are so dead set on avoiding a sort transform you could always deliver the flat file data to a staging table and have the database do the sorting.  This means that your package would have to have two data flow tasks, or you would have to break up your package into two packages.  

Posted by knight_devin@hotmail.com on 29 June 2009

Asynchronous transforms don't allow for reusing buffers.  Here's a link that explains in more detail.  


Yes, it's always going to be faster to perform these tasks in TSQL vs a task.  

As for using a staging table.  I don't have a problem with that.  In fact, I do this quite often but keep in mind some companies have strick policies prohibiting landing data that is not needed.

Posted by matthew_flannery on 30 June 2009

Thank you for the deeper explanation.  Although this begs the question why use SSIS at all for ETL if TSQL is more efficient.  SSIS becomes merely a fancy data transfer tool if we are going to avoid semi and fully blocking transformations.

The Sort transformation becomes almost unavoidable if you need to use the Merge Join or the Unpivot.  Lookups do not work against non SQL Server sources, but a Sort and Merge Join can accomplish the same thing.  This avoids the staging table restriction if it exists.

I have found the Aggregation very useful beyond the basic summarizing of data.  I have used the aggregation to go through millions of rows in seconds to find differences and imbalances between two data sets.

Used wisely the asynchronous transformations can be a great help getting the job done.

Posted by Ed on 8 July 2009

I'm not sure the sort transform can handle my specific sorting scenario.

I need to sort a flat file where the rows in the file have dissimilar column types, except for the first two columns.

I have 4 staging tables, each with different sets of columns, but the first two columns of each table are the same: employee id and record type.  Table 1 has a row for every employee and has record type '01'.  Tables 2-4 have 0 or 1 row for each employee and have types '02','03' and '09' respectively.  A bar ('|') delimited flat file is produced, with all '01' type records first, followed by all '02', '03' and '09'.  Now the client is requesting that rows in the flat file be grouped by employee id and sorted for each employee by record type.

Attempts to use the sort transform have failed with column truncation errors.  Am I missing a configuration setting, or should I be looking for an alternative solution?

Any suggestions are appreciated.



Posted by knight_devin@hotmail.com on 8 July 2009


Are you getting the truncation error from the sort transform or from the source.  If it's from the source there's an option in the Error Output tab to ignore or redirect truncation.  You'll probably want to ignore.  Email me directly if you want to talk about it more  dknight@pragmaticworks.com.

Leave a Comment

Please register or log in to leave a comment.