﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / SQL Server 2005 / T-SQL (SS2K5)  / checksums and unicode data / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Wed, 22 May 2013 09:58:48 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]Paul White (8/10/2009)[/b][hr][quote][b]Jeff Moden (8/10/2009)[/b][hr]In Lowell's example, it also turned a 13 byte string into a 20 byte Varbinary so it's expensive, too.  But, I believe that Barry's point was simply that it is unbreakable... not like CHECKSUM at all. :-)[/quote]So...SHA1 always returns 160 bits (20 bytes) regardless of the input.  HashBytes also appears to be limited to 8000 bytes of input - 8000 ANSI characters or 4000 Unicode.  Finally, it doesn't appear unbreakable either: [url=http://en.wikipedia.org/wiki/SHA_hash_functions]Wikipedia Link[/url].I mentioned HashBytes way back - now I'm just pointing out some of the reasons that I have yet to use it in a real production system.Paul[/quote]Hmm, I though that I counted 19 bytes...  Anyway, qualifying myself again :-), "nigh-unbreakable" was referring to its use as a signature in detecting random data changes, not as a security device.</description><pubDate>Tue, 11 Aug 2009 07:24:11 GMT</pubDate><dc:creator>RBarryYoung</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]Paul White (8/10/2009)[/b][hr][quote][b]RBarryYoung (8/10/2009)[/b][hr]To clarify: [i]HashBytes[/i] w/ SHA1 is nigh-unbreakable.[/quote]Hey Barry,On the other hand, it is relatively slow, only works on strings, and returns varbinary(8000).If the task is to detect changes in a row of data, would you use HashBytes (SHA1) alone?Paul[/quote]Despite the datatype, I think that it actually only returns 19 bytes.  And since VARBINARY is a string and *everything* converts to varbinary very easily, thats not that big a problem (nulls are a bigger problem).  So yeah, I *might* use it in situations were I was doing remote comparisons through linked servers and just recording the hash every day for later comparison.  The problem with the field by field comparison is that you have to have the entire previous record around to do it.Though I think that I usually used MD5 in the past.</description><pubDate>Tue, 11 Aug 2009 07:20:16 GMT</pubDate><dc:creator>RBarryYoung</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]Jeff Moden (8/10/2009)[/b][hr]In Lowell's example, it also turned a 13 byte string into a 20 byte Varbinary so it's expensive, too.  But, I believe that Barry's point was simply that it is unbreakable... not like CHECKSUM at all. :-)[/quote]So...SHA1 always returns 160 bits (20 bytes) regardless of the input.  HashBytes also appears to be limited to 8000 bytes of input - 8000 ANSI characters or 4000 Unicode.  Finally, it doesn't appear unbreakable either: [url=http://en.wikipedia.org/wiki/SHA_hash_functions]Wikipedia Link[/url].I mentioned HashBytes way back - now I'm just pointing out some of the reasons that I have yet to use it in a real production system.Paul</description><pubDate>Mon, 10 Aug 2009 20:49:28 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]Paul White (8/10/2009)[/b][hr][quote][b]RBarryYoung (8/10/2009)[/b][hr]To clarify: [i]HashBytes[/i] w/ SHA1 is nigh-unbreakable.[/quote]Hey Barry,On the other hand, it is relatively slow, only works on strings, and returns varbinary(8000).If the task is to detect changes in a row of data, would you use HashBytes (SHA1) alone?Paul[/quote]In Lowell's example, it also turned a 13 byte string into a 20 byte Varbinary so it's expensive, too.  But, I believe that Barry's point was simply that it is unbreakable... not like CHECKSUM at all. :-)</description><pubDate>Mon, 10 Aug 2009 19:46:06 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>just looking for confirmation...i used my google-fu and found info and examples for using hashbytes with SHA1, ie[code]select HASHBYTES('SHA1','My Plain Text')--results0x6D99DDF6FE7A32547B6766E0BF88B1F50835F0FF[/code]everything i read says that this is a one way operation, so you can use it to generate a unique value like checksum, but you cannot unhash this back to it's original value, right?so this is not encryption, but rather a tool to generate a unique identifier, correct?</description><pubDate>Mon, 10 Aug 2009 19:08:22 GMT</pubDate><dc:creator>Lowell</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]RBarryYoung (8/10/2009)[/b][hr]To clarify: [i]HashBytes[/i] w/ SHA1 is nigh-unbreakable.[/quote]Hey Barry,On the other hand, it is relatively slow, only works on strings, and returns varbinary(8000).If the task is to detect changes in a row of data, would you use HashBytes (SHA1) alone?Paul</description><pubDate>Mon, 10 Aug 2009 18:22:04 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>To clarify: [i]HashBytes[/i] w/ SHA1 is nigh-unbreakable.</description><pubDate>Mon, 10 Aug 2009 17:49:21 GMT</pubDate><dc:creator>RBarryYoung</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]Jeff Moden (8/10/2009)[/b][hr]Ummmm.... does SQL Server's CHECKSUM use the SHA1 algorithm because I was talking about [i]that [/i]CHECKSUM.  Obviously that doesn't take 6000 to 12000 years to happen because I've seen two of these same types of posts in the last 3 months. ;-)[/quote]Oh no, sorry, CHECKSUM produces duplicates if you so much as swap the order of two of the characters.  So it stinks for this purpose, and we are in agreement on that.  :-D</description><pubDate>Mon, 10 Aug 2009 17:48:16 GMT</pubDate><dc:creator>RBarryYoung</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>Ummmm.... does SQL Server's CHECKSUM use the SHA1 algorithm because I was talking about [i]that [/i]CHECKSUM.  Obviously that doesn't take 6000 to 12000 years to happen because I've seen two of these same types of posts in the last 3 months. ;-)</description><pubDate>Mon, 10 Aug 2009 17:21:06 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]Jeff Moden (8/9/2009)[/b][hr][quote][b]Paul White (8/8/2009)[/b][hr]HashBytes (perhaps including the PK as a salt) is a good choice if the task is to detect changes.CHECKSUM is quick and efficient for creating hash indexes - which are awesome if you need to search long strings (for example).[/quote]Even then, it can and will give an incorrect answer if you rely only on it.  I'll admit that CHECKSUM will help narrow things down on long strings, though.[/quote]I'm, going to have to disagree with this one, Jeff, at least as far as HashBytes using the SHA1 algorithm is concerned.  Assuming you did a [i]Trillion[/i] HashByte comparisons of different data a day (and no one on earth is doing that many), it would still take about 6000 to 12000 Years before you were likely to get an accidental match.  Those are odds that I can live with.  :-)</description><pubDate>Mon, 10 Aug 2009 16:38:08 GMT</pubDate><dc:creator>RBarryYoung</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]GabyYYZ (8/10/2009)[/b][hr]Is this due to the way checksum parses unicode vs. binary?[/quote]Seems so:[code]-- 0x3600 3700 3400SELECT (CONVERT(VARBINARY, N'674'));SELECT CONVERT(BINARY(2), UNICODE(N'6'))    -- 0x0036SELECT CONVERT(BINARY(2), UNICODE(N'7'))    -- 0x0037SELECT CONVERT(BINARY(2), UNICODE(N'4'))    -- 0x0034[/code]The binary representation is byte-reversed.Paul</description><pubDate>Mon, 10 Aug 2009 15:54:45 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>Here's a thought.Given that a binary representation of a value is as unambiguous a way to represent a value as you can get, explain how this occurs:[code]select convert(varbinary, convert(nvarchar(3),'674'));  -- result 0x36003 7 003400select convert(varbinary, convert(nvarchar(3),'654'));  -- result 0x36003 5 003400[/code]Running a checksum on the two binary values gives them a different result from the original checksum and from each other this time (unlike the original where the two were the same).  Is this due to the way checksum parses unicode vs. binary?</description><pubDate>Mon, 10 Aug 2009 10:58:08 GMT</pubDate><dc:creator>GabyYYZ</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]Jeff Moden (8/9/2009)[/b][hr]Even then, it can and will give an incorrect answer if you rely only on it.  I'll admit that CHECKSUM will help narrow things down on long strings, though.[/quote]My admission for the day: I have never relied on HashBytes (with the key as a salt) - I always end up get nervous and comparing all the values instead :-DTechnically I think the chances of a collision using HashBytes (with SHA1) in the way I describe is overwhelmingly unlikely, though the mathematics to demonstrate that are far beyond me.  My co-workers consider CHECKSUM with a PK salt to be sufficient (please don't get me started on that).Summary: when hash indexing, I use CHECKSUM to narrow the search followed by an explicit comparison.To detect changes, I compare all column values (unless I am fortunate enough to be working in 2K8 when I try to use CDC).  Many people swear by HashBytes - I am more likely to swear at it - though my views are not universally shared, it must be said.Paul</description><pubDate>Sun, 09 Aug 2009 17:18:04 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]Paul White (8/8/2009)[/b][hr]HashBytes (perhaps including the PK as a salt) is a good choice if the task is to detect changes.CHECKSUM is quick and efficient for creating hash indexes - which are awesome if you need to search long strings (for example).[/quote]Even then, it can and will give an incorrect answer if you rely only on it.  I'll admit that CHECKSUM will help narrow things down on long strings, though.</description><pubDate>Sun, 09 Aug 2009 12:36:29 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]Jeff Moden (8/5/2009)[/b][hr]I don't know why they even bothered except that every language seems to have one and many of them have the same problem.  Closer to being perfect is the LUHN 10 checksum (sanity check on credit card numbers)... but even that can have a failure.[/quote]CHECKSUM is perhaps an unfortunate name for the function.  It is not intended to be used as a checksum in the CRC sense:[quote][b]Books Online: CHECKSUM (T-SQL)[/b][hr]CHECKSUM computes a hash value, called the checksum, over its list of arguments. The hash value is intended for use in building hash indexes. If the arguments to CHECKSUM are columns, and an index is built over the computed CHECKSUM value, the result is a hash index. This can be used for equality searches over the columns.CHECKSUM satisfies the properties of a hash function: CHECKSUM applied over any two lists of expressions returns the same value if the corresponding elements of the two lists have the same type and are equal when compared using the equals (=) operator. For this definition, null values of a specified type are considered to compare as equal. If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change. For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change. Consider using HashBytes instead. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM.The order of expressions affects the resultant value of CHECKSUM. The order of columns used with CHECKSUM(*) is the order of columns specified in the table or view definition. This includes computed columns.[/quote]HashBytes (perhaps including the PK as a salt) is a good choice if the task is to detect changes.CHECKSUM is quick and efficient for creating hash indexes - which are awesome if you need to search long strings (for example).Paul</description><pubDate>Sat, 08 Aug 2009 06:07:14 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote][b]Adam Gojdas (8/7/2009)[/b][hr][quote]Closer to being perfect is the[b] LUN 10 checksum[/b] (sanity check on credit card numbers)... but even that can have a failure.[/quote]Just in case anyone else is interested in finding out more regarding the checksum Jeff mentions I found it is actually:[b]LUHN10 checksum[/b][/quote]Heh... thanks for the correction... I ate garlic and didn't want to breath on anyone too hard so I left the "H" out. :-P</description><pubDate>Fri, 07 Aug 2009 17:53:11 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>[quote]Closer to being perfect is the[b] LUN 10 checksum[/b] (sanity check on credit card numbers)... but even that can have a failure.[/quote]Just in case anyone else is interested in finding out more regarding the checksum Jeff mentions I found it is actually:[b]LUHN10 checksum[/b]</description><pubDate>Fri, 07 Aug 2009 14:39:09 GMT</pubDate><dc:creator>Adam Gojdas</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>...and the checksum algorithm is nothing more than a simple "Exclusive OR" at the byte level.  It does NOT quarantee that two rows are different and it doesn't guarantee that a row hasn't been somehow changed.  It should only be used as a basic sanity check (if that).  I don't know why they even bothered except that every language seems to have one and many of them have the same problem.Closer to being perfect is the LUN 10 checksum (sanity check on credit card numbers)... but even that can have a failure.</description><pubDate>Wed, 05 Aug 2009 22:46:45 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>The primary differnce is the way the data is stored single-byte versus double-byte. It just so happens that the checksum algorithmn generates duplicates in this particular case.</description><pubDate>Wed, 05 Aug 2009 16:18:43 GMT</pubDate><dc:creator>Lamprey13</dc:creator></item><item><title>checksums and unicode data</title><link>http://www.sqlservercentral.com/Forums/Topic765856-338-1.aspx</link><description>Can someone explain whyselect checksum(convert(nvarchar(3),'674'));select checksum(convert(nvarchar(3),'645'));yield the same value?Whileselect checksum('674')select checksum('645')do not.  What is is about unicode data that caused the checksum to be the same?TIA</description><pubDate>Wed, 05 Aug 2009 15:17:01 GMT</pubDate><dc:creator>iBuildDW</dc:creator></item></channel></rss>