﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / SQL Server 2005 / SQL Server 2005 Integration Services  / Checksum function??? / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Wed, 19 Jun 2013 08:07:09 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>The OP has not responded for quite some time now.SQL Learner?  Are you still working on this?  We'd appreciate some feedback as to how you've resolved the problem.  Having that in the thread will help others that come accross this thread down the road.</description><pubDate>Thu, 10 Sep 2009 10:26:10 GMT</pubDate><dc:creator>John Rowan</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>I have found the checksum to be useful in some cases and unreliable in some. If the data set if distinct then the checksum function (of all rows stands good), but in another case i had to get 27 Million rows from DB2 with no primary key, no uniqueness on the table, so i thought of using checksum to compare the rows in a staging environment. Not only did i have the overhead of creating the checksum, i also had to compare it + the checksum gives me a integer value based on the row uniqueness, where i had duplicates and everything was getting messed up..... so in short i would think the usage totally depends on the case.</description><pubDate>Thu, 10 Sep 2009 07:29:21 GMT</pubDate><dc:creator>vishal.gamji</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>Is there any sort of date column in the source table that could show you which rows have been updated?  If you could do something to avoid comparing the million rows, that may help.  Also, have you tried the SCD transformation?  Between that and proper indexing on your dimension, you should still get decent performance out of the compare.How much free reign do you have on the source system?  Would it be possible to make changes there to aid in making this easier?</description><pubDate>Tue, 08 Sep 2009 09:53:50 GMT</pubDate><dc:creator>John Rowan</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>If you are using sql 2005, and the columns you are interested in are varchar,nvarchar, or varbinary, you can try making a computed column out of the /b HashBytes /b function.  There are a few limits, one being the total input size cannot be greater than 8k ( I think this was fixed in either sp3 or 2008).  I also believe it is deterministic so you can put an index on it.  Use the SHA1 algorithm as the chance of collision is 1 in 2^80.</description><pubDate>Fri, 04 Sep 2009 08:37:08 GMT</pubDate><dc:creator>sqlrunner</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>[quote][b]Mark Pratt (9/1/2009)[/b][hr]I did this recently to determine if a row needed to be updated.  I put a checksum column in the table using the values that could possibly be updated.  The stored procedure had a checksum variable based on the input variables.  The input variables for the proc are also the DB columns that could possibly be updated.   By comparing the DB checksum calculation with the stored procedure checksum variable I knew whether the row needed updating or not.[/quote]Hi,Yep, this is exactly the point I was trying to make. Note that is very easily doable using a script component in a data flow and from what we've tried to far, goes pretty fast.Regards,Greg</description><pubDate>Wed, 02 Sep 2009 11:38:32 GMT</pubDate><dc:creator>ducon</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>[quote][b]Mark Pratt (9/1/2009)[/b][hr]I did this recently to determine if a row needed to be updated.  I put a checksum column in the table using the values that could possibly be updated.  The stored procedure had a checksum variable based on the input variables.  The input variables for the proc are also the DB columns that could possibly be updated.   By comparing the DB checksum calculation with the stored procedure checksum variable I knew whether the row needed updating or not.[/quote]Makes sense.</description><pubDate>Tue, 01 Sep 2009 11:20:19 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>I did this recently to determine if a row needed to be updated.  I put a checksum column in the table using the values that could possibly be updated.  The stored procedure had a checksum variable based on the input variables.  The input variables for the proc are also the DB columns that could possibly be updated.   By comparing the DB checksum calculation with the stored procedure checksum variable I knew whether the row needed updating or not.</description><pubDate>Tue, 01 Sep 2009 10:24:50 GMT</pubDate><dc:creator>JustANumber</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>[quote][b]SQL Learner (8/28/2009)[/b][hr]Thanks a lot GSquared....From what I got isI will have to add checksum column for each column that may change ...Source as well as Target...--edit[/quote]You need only a single checksum per row, based on all the columns whose values you are tracking for changes, eg[code="sql"]select checksum('A','B','Z')34391[/code]That's how you get the speed benefit - you need only compare two columns (PK and checksum) when looking for changes, rather than all columns.Note also Gus' comments about Checksum and Binary_Checksum</description><pubDate>Mon, 31 Aug 2009 03:48:20 GMT</pubDate><dc:creator>Phil Parkin</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>Hi there,Not sure this might help you out but in a project I'm working on, we have decided to go with an MD5 hash that is computed via a script task in our data flow (there are a bunch of ways to do this, that's the one we went with because we could easily switch for another scheme). The hash is saved in the dimension. When we get rows from the operational system, we compare the hash and based on that decide whether an update/insert/no action is needed. The thing is that with checksums, hashes etc. you can have collisions, you just have to check how high the risk and if you can ever afford one. One could argue that it's the same with GUIDs, while it's highly improbable, it's not impossible. If you can't afford that then you're better off with either a column-by-column comparison or an extra column in your source data that can tell you the latest time something was changed or something like that.Hope this helps,Greg</description><pubDate>Sun, 30 Aug 2009 19:04:37 GMT</pubDate><dc:creator>ducon</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>Thanks a lot GSquared....From what I got isI will have to add checksum column for each column that may change in my source in both the places....Source as well as Target...But my source is in different server (DB2) so I will have to stage it first and then add checksum for like 10 changing columns and similarly in my target i will add 10 checksum columns add will compare on them.????Is this how checksum works...?Actually my sroucre (DB2) has 7 million records...so i can't do row by row comparision(takes more than an hour)Please let me know whr to add columns ? in SRC STG or in TARGET?Thanks</description><pubDate>Fri, 28 Aug 2009 09:26:22 GMT</pubDate><dc:creator>Learner1</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>How wide is the dimension, and are we talking type 1 or type 2 SCD columns?</description><pubDate>Wed, 26 Aug 2009 13:10:03 GMT</pubDate><dc:creator>Michael Ross-468051</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>I'm pretty sure it was checksum, not binary.  Since I didn't care about the difference in case, which is the primary difference between the two.[code]select checksum('A'),checksum('a'), binary_checksum('A'), binary_checksum('a');[/code]Results:142 142 65 97With what I was doing, case differences didn't matter, only actual content differences.  That's the decision that determines which you should use.  Of course, it's also critical that you use the same one on both sides of the equation.</description><pubDate>Tue, 25 Aug 2009 12:42:36 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>Gus, are (did) you use CHECKSUM or BINARY_CHECKSUM?  If I was going this direction, I would look at the later.  Of course, I don't have the volume of data you are talking about either.</description><pubDate>Tue, 25 Aug 2009 12:30:04 GMT</pubDate><dc:creator>Lynn Pettis</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>You can use a checksum to narrow down the search.  It can make a good leading-edge for what would otherwise be a multi-column index.Checksum isn't exact, but it can reduce the number of rows that need to be compared, which can reduce I/O on large data sets.What you do is work out what columns you are most likely to need to compare (might be all of them, might be a subset), and generate checksums for them.  Persist and index those.  Then, join the source table and target table on the checksum columns, as well as other criteria.I've done speed tests with this in very large tables (one was half a petabyte for one table with hundreds of millions of rows), and it can result in much faster returns and much less I/O, when used on multi-column seeks/compares.That's the general idea.  Are you looking for specifics?  If so, please clarify what specifics you need help on.</description><pubDate>Tue, 25 Aug 2009 12:18:07 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>Here is my two cents.  Don't use it.  It is not a guarantee that it will identify all changes to your data.  It is possible for a combination of fields to result in the same checksum value.  Your best way to identify changes is, in my opinion, a column by column comparision.</description><pubDate>Tue, 25 Aug 2009 11:54:03 GMT</pubDate><dc:creator>Lynn Pettis</dc:creator></item><item><title>RE: Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>any help?</description><pubDate>Tue, 25 Aug 2009 11:47:54 GMT</pubDate><dc:creator>Learner1</dc:creator></item><item><title>Checksum function???</title><link>http://www.sqlservercentral.com/Forums/Topic776261-148-1.aspx</link><description>HiCan anyone guide me on using checksum to compare million records in source and target...I have to update a dimension if sumthing changes, I cant use lookup as it takes lot of time...If anyone has ever used checksum() function, can you guide me on how to use it...Any help is appreciated.thanks</description><pubDate>Mon, 24 Aug 2009 13:12:18 GMT</pubDate><dc:creator>Learner1</dc:creator></item></channel></rss>