Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

HASHBYTES: Is CHECKSUM really required? Expand / Collapse
Author
Message
Posted Wednesday, March 27, 2013 5:58 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, April 9, 2013 2:57 AM
Points: 1, Visits: 10
I found the issue with Null value.If we concatenate n columns in table1 and compare them with same type of concatenated n columns in table2 . If any column in table 1 is null, Hashbyte result of whole concatenated columns would be Null and binary_checksum result will be some default value. So first we have to use coalesce function or some other way to treat null value for columns. Then only we can take benefit of these comparison functions.
Post #1435848
Posted Wednesday, March 27, 2013 6:17 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, November 24, 2014 1:01 PM
Points: 33, Visits: 345
HASHBYTES is meant for a literal string (or binary) import of one column, so any potentially null fields would need to be wrapped in ISNULL(x,'') - and that's assuming you want a NULL string to be treated the exact same as an empty string.

CHECKSUM and BINARY_CHECKSUM have multi-column input, as opposed to a single string value.
Post #1435855
Posted Wednesday, March 27, 2013 6:39 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, November 20, 2014 7:31 AM
Points: 2,360, Visits: 576
Your query using the CHECKSUM index is incorrect:

SELECT * 
FROM dbo.PostalCode
WHERE PostalCheckSum = CHECKSUM(@PostalArea, @PostalCity, @PostalState);

As previously mentioned, the value returned from CHECKSUM will not be unique for the inputs. The same input will always produce the same output, but different input is not guaranteed to produce different output.

Therefore, your query still needs to test the other columns:

SELECT * 
FROM dbo.PostalCode
WHERE PostalCheckSum = CHECKSUM(@PostalArea, @PostalCity, @PostalState)
AND PostalArea = @PostalArea
AND PostalCity = @PostalCity
AND PostalState = @PostalState;




Post #1435863
Posted Wednesday, March 27, 2013 8:18 AM


UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Friday, November 21, 2014 6:49 AM
Points: 1,451, Visits: 1,855
suharis (3/27/2013)
HI Nakul, we are using CHECKSUM currently, not sure if this is used by Default features of database, but always our etls are failing with CHECKSUM causing I/O errors


Are you trying to insert a CHECKSUM value? It should be a calculated field (it can be persisted by creating an index over the CHECKSUM).

This is a great forum question - can you post it with supporting details on the SSC forums?


Thanks & Regards,
Nakul Vachhrajani.
http://nakulvachhrajani.com
Be courteous. Drive responsibly.

Follow me on
Twitter: @sqltwins
Google Plus: +Nakul
Post #1435931
Posted Wednesday, March 27, 2013 9:42 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, November 24, 2014 4:11 AM
Points: 8, Visits: 100
Sorry this is a very bad example. Checksum like you say is not unique, creating very many duplicates. So as a solution is rather pointless, if the user gets incorrect results back from the query. Yes it saved index space, but the end query result is wrong!

Maybe I missed the point of the example?

Post #1435993
Posted Friday, March 29, 2013 3:52 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, April 1, 2013 10:43 AM
Points: 1, Visits: 8
Nakal, this is a bad article. A SELECT using the checksum is not guaranteed to return the same results as a search using the actual values. If you use this in an app the results will be unreliable and potentially this can go undetected since it will mostly be correct. Mostly correct is worse than useless.

I appreciate you would have meant well, but you should withdraw the post or fix it. If there is a use for the checksum index that is not flawed then by all means put that in instead. SScommitted showed a non-flawed query, so that would be usable. Not sure if that query is better with the checksum than without it, but if your post gave a comparison of the two, it would then be a worthwhile article.
Post #1436854
Posted Monday, April 1, 2013 2:25 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: 2 days ago @ 8:34 AM
Points: 291, Visits: 210
HASHBYTES comes at an additional storage cost, but not always as high as 8000 bytes as mentioned in your post.
Like danny.heijl points out it is only 16 bytes for MD5 or 20 bytes for SHA1.
Post #1437378
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse