SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Use HASHBYTES to identify changes


Use HASHBYTES to identify changes

Author
Message
Jim Youmans-439383
Jim Youmans-439383
SSC-Addicted
SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)

Group: General Forum Members
Points: 474 Visits: 536
Comments posted to this topic are about the item Use HASHBYTES to identify changes
jack.james.holmes
jack.james.holmes
SSC Journeyman
SSC Journeyman (79 reputation)SSC Journeyman (79 reputation)SSC Journeyman (79 reputation)SSC Journeyman (79 reputation)SSC Journeyman (79 reputation)SSC Journeyman (79 reputation)SSC Journeyman (79 reputation)SSC Journeyman (79 reputation)

Group: General Forum Members
Points: 79 Visits: 71
Hi

This would be really useful for us to speed up our ETL process, but when I adapt the code to run for our main fact table it tells me

"String or binary data would be truncated."

I am guessing this is because the row has too much data in it?

Is there any way I can get around this so that it still identifies changes to any column in the record? What is the maximum amount of data I can use this function with?

Thanks


Jack
David Griffiths-273839
David Griffiths-273839
SSC Veteran
SSC Veteran (291 reputation)SSC Veteran (291 reputation)SSC Veteran (291 reputation)SSC Veteran (291 reputation)SSC Veteran (291 reputation)SSC Veteran (291 reputation)SSC Veteran (291 reputation)SSC Veteran (291 reputation)

Group: General Forum Members
Points: 291 Visits: 164
Doesn't the column set returned by the SELECT * include the column rowDiffHash ?

In which case the hash is non-deterministic - in the sense that repeating the UPDATE multiple times would return different results each time.
Jim Youmans-439383
Jim Youmans-439383
SSC-Addicted
SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)

Group: General Forum Members
Points: 474 Visits: 536
The Hashbytes function should always return a varbinary(20), but the input values are limited to 8000 bytes. So if your row adds up to more than 8000 bytes, then that would be the issue. Instead of doing a "SELECT *" you could do a SELECT of only the columns that you want to watch for change on.

Jim
Jim Youmans-439383
Jim Youmans-439383
SSC-Addicted
SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)

Group: General Forum Members
Points: 474 Visits: 536
Yes, SELECT * will include the rowDiffHash if it has a value. My process checks the record set received today against the record set received the day before, so for me, the rowDiffHash is always NULL. If you want to recalculate the row value, you will have to select specific columns and exclude the rowDiffHash column, otherwise that would throw off the process.

Jim
Nadrek
Nadrek
SSCommitted
SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)

Group: General Forum Members
Points: 1876 Visits: 2726
Jim Youmans-439383 (11/25/2013)
The Hashbytes function should always return a varbinary(20), but the input values are limited to 8000 bytes. So if your row adds up to more than 8000 bytes, then that would be the issue. Instead of doing a "SELECT *" you could do a SELECT of only the columns that you want to watch for change on.

Jim


Agreed, Jim - particularly with the XML PATH, there's a huge amount of overhead for each row, so it's very easy to end up with truncation/errors.

Additionally, where are the benchmarks (done with Profiler) that show performance differences, including the overhead of the extra column and HASHBYTES calculations, with good indexing?

Many other pitfalls involved in using hashes to detect differences are discussed in:
http://www.sqlservercentral.com/Forums/Topic902868-507-1.aspx

Suffice it to say:
If the hash is different, the data is different.
If the hash is the same, you know nothing about the data for certain.
Jim Youmans-439383
Jim Youmans-439383
SSC-Addicted
SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)SSC-Addicted (474 reputation)

Group: General Forum Members
Points: 474 Visits: 536
If you use HASHBYTES with the SHA1 you can expect a collision after you generate hashes for about 1,208,925,819,614,629,174,706,176 rows of data. This would take me a few million years. If you use the MD5, collisions are much more common.

Here is a really good article about it:
http://sqlblog.com/blogs/michael_coles/archive/2010/04/17/find-a-hash-collision-win-100.aspx
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search