Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Devin Knight

Devin is a BI consultant at Pragmatic Works Consulting. Previously, he has tech edited the book Professional Microsoft SQL Server 2008 Integration Services and was an author in the book Knight's 24-Hour Trainer: Microsoft SQL Server 2008 Integration Services. Devin has spoken at past conferences like PASS and at several SQL Saturday events. He is a contributing member to the Business Intelligence Special Interest Group (SIG) for PASS as a leader in the SSIS Focus Group. Making his home in Jacksonville, FL, Devin is a participating member of the local users’ group (JSSUG).

SSIS – CHECKSUM Transformation

CHECKSUM is a TSQL function that computes a hash value over a list of arguments.  The great thing about checksum2creating a hash value for a table is that it gives you a unique value to represent each row. 

When would you use the CHECKSUM Function?

An example of using a CHECKSUM is to store the unique value for the entire row in a column that can be used later for comparison.  This is useful in a condition where all of the rows in a table need to be compared to perform an UPDATE.

When would you use the CHECKSUM Transformation?

Now that you have a little background on what CHECKSUM is it’s time to learn how to use it in SSIS.  The CHECKSUM Transformation is not included by default in SSIS but it was developed by Konesans and can be downloaded from www.sqlis.com

checksum

Normally in a package that is required to check for changes in data every single column from the destination must be compared to the incoming source.  With the CHECKSUM Transform creating a unique value for each row you only have to compare one column and that’s your CHECKSUM Value.

Checksum != DestCheckSum

To configure the CHECKSUM Transform open the editor and check the columns that you want to have the hash value created for and give the output column a name.

This is a great tool and a big time saver when developing!

Comments

Posted by Steve Jones on 5 January 2009

Good to see you blogging and nice summary here. I might include a slightly longer example of where this makes sense, perhaps a scenario that you've used it.

Posted by knight_devin@hotmail.com on 5 January 2009

Thanks for the advice.  Maybe this topic would be a good article if I elaborated a lot more.

Posted by Steve Jones on 6 January 2009

It definitely would be. It's a good blog length and a specific or two would help, but I wouldn't walk through step by step. If you do that, make it an article and submit it me or another publisher.

Posted by markve on 8 January 2009

It does not work for the CRC32 algorithm. Create an SQL table ChecksumTest with 1 or more columns. Fill it with the same values for about 5 rows. You will see that the algorithm "Original" and "FrameworkChecksum" works fine. The results in all rows are the same. For the CRC32 algorithm the value is different for each row.

Anyone got an idea what's going wrong here?

Posted by knight_devin@hotmail.com on 8 January 2009

That's interesting I haven't tried using that algorithm.  I did find this link explaining the CRC32 algorithm and it says not to use it for data integrity.  I don't really see  a good reason why though.

www.accuhash.com/what-is-crc32.html

Posted by Hubajube on 13 January 2009

You can't use CRC32 if you must guarantee that you catch every change.  That is, there exist many inputs that have the same checksum.

The CRC32 algorithm bug over at sqlis.com is a different beast.  It seems like they are including the row number in the input somehow, meaning that identical values will produce different checksums if they exist in different rows.  That's unfortunate because it makes this algorithm unusable.

Posted by rmg66 on 13 January 2009

I discovered the same "bug" myself. Since the other two options (Original and FrameworkChecksum) have known issues I want to avoid using them, but this issue with the CRC32 option makes it impossible to use. I have breifly reviewed the code (I'm not a dot.net programmer by any means) but it seems the stream that you are pushing to the hash function is somehow grabbing the internal row number on each row that it processes.

I can't imagine this is by design, unless I'm missing something here.

Posted by raghava_2507-1128442 on 19 August 2010

I have created a checksum column(out of three columns) in a table. In the SSIS checksum transformation also I have selected the same three columns and opted for CRC-32 algorithm. Now the problem here is, there is no match between the hash value generated from the SQL server and the value generated from the SSIS checksum transformation.

Please help.

Posted by developermax on 30 May 2011

Hi,

Another way to use CHECKSUM for comparing rows between source and destination it is to add the CHECKSUM function direct in the queries for source  and lookup and use after

a conditional split to redirect the rows how changed .

Posted by westers on 25 August 2011

Using TSQL checksum is unreliable for change detection on numeric types as the following example shows.

-- changing 10.0 to 100.0 should produce a unique checksum

-- but it doesn't.  Same applies to binary_checksum

SELECT checksum('a',10.0), checksum('a',100.0)

Posted by knight_devin@hotmail.com on 1 November 2011

Just catching up on comments guys.. I actually don't recommend using this tool now for the reasons documented in the comments.  I use the TSQL Function Hashbytes now.

Leave a Comment

Please register or log in to leave a comment.