SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Space Sensitivity in SSIS Lookups

It's been well-documented through myriad blogs and forum posts about the case sensitivity of the comparisons in the SSIS lookup transformation (a good review can be found here). In a nutshell, a comparison using the lookup transformation is case sensitive when using the default setting of Full Cache, even if the values in the database are stored in a case insensitive collation. This happens because, in Full Cache mode, the comparison is done by the SSIS engine rather than the database engine, the former of which differentiates between "Value", "VALUE", and "VaLuE" regardless of the collation setting.

But did you know that this transformation is space sensitive as well? Consider the following T-SQL code:

SELECT Cast('char value' AS CHAR(20)) [val1]
INTO   test1

SELECT Cast('char value' AS CHAR(40)) [val2]
INTO   test2

SELECT t1.val1,
FROM   test1 t1
       INNER JOIN test2 t2
         ON t1.val1 = t2.val2

As you would expect, executing this code results in a successful match (INNER JOIN), even though we're comparing CHAR values of differing lengths (for more information, see this article for more information about spaces and padding in SQL Server).

However, when the same comparison is run through an SSIS lookup transformation in Full Cache mode, the lookup on our sample data will not result in a match. Similar to the case sensitive lookup, you'll find that the SSIS engine would treat the strings 'Hello World' and 'Hello World  ' (note the trailing spaces) as dissimilar values.  Unlike in SQL Server, trailing whitespace is significant in SSIS value comparisons.

As a workaround, you can use the TRIM() function in SSIS and the RTRIM() T-SQL function to insure that your comparisons are ignorant of trailing whitespace.  Alternatively, you could use a cache mode other than Full Cache, but you should be aware of the other implications before making such a change. 

Note that this behavior is limited to fixed-length character fields, but could lead to some unexpected and hard-to-detect problems if you aren’t aware of the behavior.

Tim Mitchell

Tim Mitchell is a business intelligence consultant, author, trainer, and Microsoft Data Platform MVP with over thirteen years of data management experience. He is the founder and principal of Tyleris Data Solutions.

Tim has spoken at international and local events including the SQL PASS Summit, SQLBits, SQL Connections, along with dozens of tech fests, code camps, and SQL Saturday events. He is coauthor of the book SSIS Design Patterns, and is a contributing author on MVP Deep Dives 2.

You can visit his website and blog at TimMitchell.net or follow him on Twitter at @Tim_Mitchell.


Posted by Steve Jones on 29 September 2009

I did not know that. However that makes more sense to me. If I want matches, then I want to have exact matches. This has been a problem at time in SQL Server for me when it removes spaces that I want considered, or lengths are unknowningly different.

I wish we had an = and a == that would allow us both behaviors in the db engine.

Posted by robp on 30 September 2009



SELECT t1.val1,


FROM   test1 t1

      INNER JOIN test2 t2  

      ON t1.val1 like t2.val2

Leave a Comment

Please register or log in to leave a comment.