Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

HASHBYTES: Is CHECKSUM really required? Expand / Collapse
Author
Message
Posted Tuesday, March 26, 2013 10:37 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Sunday, July 20, 2014 4:15 AM
Points: 1,402, Visits: 1,803
Comments posted to this topic are about the item HASHBYTES: Is CHECKSUM really required?

Thanks & Regards,
Nakul Vachhrajani.
http://beyondrelational.com/modules/2/blogs/77/nakuls-blog.aspx
Be courteous. Drive responsibly.

Follow me on
Twitter: @nakulv_sql
Google Plus: +Nakul
Post #1435761
Posted Wednesday, March 27, 2013 1:26 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, May 27, 2014 7:17 PM
Points: 197, Visits: 459
Thanks for the article Nakul.

I have a question about HASHBYTES (and hashing in general) to which I have found it impossible so far to find a definite answer. Perhaps you know: you list HASHBYTES as providing change detection, but is that really true?

In a lot of places you see claims that "any change" will affect the hash value but that is not true: it is easy to see that for any (finite) hash algorithm there are always two files (or datablobs) that are the same size and differ by only two bits and that will give the same hash value. But it is not so obvious (at least to me) if there can be two files/datablobs that differ by only one bit and that will have the same hash value. Of course I realise that from a practical point of view it is extremely unlikely that any two files/datablobs will have the same hash value but I am interested in the theoretical question: will a single bit change always change the hash value?

Thanks,
Alex
Post #1435785
Posted Wednesday, March 27, 2013 1:57 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, May 28, 2014 4:25 AM
Points: 1, Visits: 24
Hi guys,

Thanks a lot for your publication, just few words to advise.

I use this approch, and I meet many problem with the checksum process.

So if you have some Null value in the key the checksum failed or return an "strange" value.
So take care to controle all fields have a value and best way be not null, and to reduce failure of cheksum process, i add a "|" between each fields, add an IsNull function 'In case Of".

Regards
Post #1435791
Posted Wednesday, March 27, 2013 4:18 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, October 15, 2013 5:17 AM
Points: 2, Visits: 22
Is CHECKSUM guarenteed to be unique? Can different combinations of data give the same CHECKSUM value?
Post #1435815
Posted Wednesday, March 27, 2013 4:54 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, May 30, 2014 2:51 AM
Points: 1, Visits: 36
A checksum is 4 bytes, with very little chance for uniqueness, and is very easily tampered with.

Hashbytes is a 16 bytes (for MD5) or 20 bytes (for SHA1) hash value calculated over a maximum of 8000 input bytes, with a pretty good guarantee for uniqueness and in the case of SHA1 also very difficult to tamper with..

So HashBytes (SHA1) is the right choice in my opinion.
Post #1435823
Posted Wednesday, March 27, 2013 5:20 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, July 11, 2014 8:45 AM
Points: 42, Visits: 455
Good Article Nakul,

But it would have benefited immensely from a paragraph on fault tolerance.

Microsoft's msdn page refers to it explicitly...
http://msdn.microsoft.com/en-gb/library/ms189788.aspx

As table rows increase the chance of the "Birthday Paradox" increases and as the strategy is to reduce index size and therefore search times. I would imagine you would be targeting a table with a large number of rows. So perhaps a counter productive strategy.

Agreeably tables with a relatively low number of rows but a very wide search criteria may benefit from the CHECKSUM strategy.
Post #1435832
Posted Wednesday, March 27, 2013 5:20 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, July 11, 2014 8:45 AM
Points: 42, Visits: 455
Good Article Nakul,

But it would have benefited immensely from a paragraph on fault tolerance.

Microsoft's msdn page refers to it explicitly...
http://msdn.microsoft.com/en-gb/library/ms189788.aspx

As table rows increase the chance of the "Birthday Paradox" increases and as the strategy is to reduce index size and therefore search times. I would imagine you would be targeting a table with a large number of rows. So perhaps a counter productive strategy.

Agreeably tables with a relatively low number of rows but a very wide search criteria may benefit from the CHECKSUM strategy.
Post #1435833
Posted Wednesday, March 27, 2013 5:20 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, July 11, 2014 8:45 AM
Points: 42, Visits: 455
Good Article Nakul,

But it would have benefited immensely from a paragraph on fault tolerance.

Microsoft's msdn page refers to it explicitly...
http://msdn.microsoft.com/en-gb/library/ms189788.aspx

As table rows increase the chance of the "Birthday Paradox" increases and as the strategy is to reduce index size and therefore search times. I would imagine you would be targeting a table with a large number of rows. So perhaps a counter productive strategy.

Agreeably tables with a relatively low number of rows but a very wide search criteria may benefit from the CHECKSUM strategy.
Post #1435834
Posted Wednesday, March 27, 2013 5:52 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Sunday, July 20, 2014 4:15 AM
Points: 1,402, Visits: 1,803
Thank-you, all for your time and valuable feedback.

Yes, CHECKSUM fails to detect symmetric changes and has therefore been listed only as a "basic" change detection mechanism. The "ideal" one would be HASHBYTES, but it comes at an additional cost - storage.

As mentioned in the article, I would request the kind reader to please go through my in-depth study on the various change and tamper detection mechanisms available where I attempt to evaluate each of the pros & cons of the various methods (Part 01 and Part 02) .


Thanks & Regards,
Nakul Vachhrajani.
http://beyondrelational.com/modules/2/blogs/77/nakuls-blog.aspx
Be courteous. Drive responsibly.

Follow me on
Twitter: @nakulv_sql
Google Plus: +Nakul
Post #1435845
Posted Wednesday, March 27, 2013 5:55 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, April 9, 2013 3:18 PM
Points: 1, Visits: 3
HI Nakul, we are using CHECKSUM currently, not sure if this is used by Default features of database, but always our etls are failing with CHECKSUM causing I/O errors
Post #1435846
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse