Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

checksums and unicode data Expand / Collapse
Author
Message
Posted Monday, August 10, 2009 4:38 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Tuesday, November 25, 2014 11:54 AM
Points: 9,294, Visits: 9,492
Jeff Moden (8/9/2009)
Paul White (8/8/2009)
HashBytes (perhaps including the PK as a salt) is a good choice if the task is to detect changes.
CHECKSUM is quick and efficient for creating hash indexes - which are awesome if you need to search long strings (for example).


Even then, it can and will give an incorrect answer if you rely only on it. I'll admit that CHECKSUM will help narrow things down on long strings, though.

I'm, going to have to disagree with this one, Jeff, at least as far as HashBytes using the SHA1 algorithm is concerned. Assuming you did a Trillion HashByte comparisons of different data a day (and no one on earth is doing that many), it would still take about 6000 to 12000 Years before you were likely to get an accidental match. Those are odds that I can live with.


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #768260
Posted Monday, August 10, 2009 5:21 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 5:34 PM
Points: 35,609, Visits: 32,200
Ummmm.... does SQL Server's CHECKSUM use the SHA1 algorithm because I was talking about that CHECKSUM. Obviously that doesn't take 6000 to 12000 years to happen because I've seen two of these same types of posts in the last 3 months.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #768276
Posted Monday, August 10, 2009 5:48 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Tuesday, November 25, 2014 11:54 AM
Points: 9,294, Visits: 9,492
Jeff Moden (8/10/2009)
Ummmm.... does SQL Server's CHECKSUM use the SHA1 algorithm because I was talking about that CHECKSUM. Obviously that doesn't take 6000 to 12000 years to happen because I've seen two of these same types of posts in the last 3 months.

Oh no, sorry, CHECKSUM produces duplicates if you so much as swap the order of two of the characters. So it stinks for this purpose, and we are in agreement on that.


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #768286
Posted Monday, August 10, 2009 5:49 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Tuesday, November 25, 2014 11:54 AM
Points: 9,294, Visits: 9,492
To clarify: HashBytes w/ SHA1 is nigh-unbreakable.

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #768287
Posted Monday, August 10, 2009 6:22 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 4:51 AM
Points: 9,928, Visits: 11,199
RBarryYoung (8/10/2009)
To clarify: HashBytes w/ SHA1 is nigh-unbreakable.

Hey Barry,

On the other hand, it is relatively slow, only works on strings, and returns varbinary(8000).
If the task is to detect changes in a row of data, would you use HashBytes (SHA1) alone?

Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #768297
Posted Monday, August 10, 2009 7:08 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 10:20 AM
Points: 12,927, Visits: 32,332
just looking for confirmation...i used my google-fu and found info and examples for using hashbytes with SHA1, ie
select HASHBYTES('SHA1','My Plain Text')
--results
0x6D99DDF6FE7A32547B6766E0BF88B1F50835F0FF

everything i read says that this is a one way operation, so you can use it to generate a unique value like checksum, but you cannot unhash this back to it's original value, right?

so this is not encryption, but rather a tool to generate a unique identifier, correct?


Lowell

--There is no spoon, and there's no default ORDER BY in sql server either.
Actually, Common Sense is so rare, it should be considered a Superpower. --my son
Post #768305
Posted Monday, August 10, 2009 7:46 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 5:34 PM
Points: 35,609, Visits: 32,200
Paul White (8/10/2009)
RBarryYoung (8/10/2009)
To clarify: HashBytes w/ SHA1 is nigh-unbreakable.

Hey Barry,

On the other hand, it is relatively slow, only works on strings, and returns varbinary(8000).
If the task is to detect changes in a row of data, would you use HashBytes (SHA1) alone?

Paul


In Lowell's example, it also turned a 13 byte string into a 20 byte Varbinary so it's expensive, too.

But, I believe that Barry's point was simply that it is unbreakable... not like CHECKSUM at all.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #768319
Posted Monday, August 10, 2009 8:49 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 4:51 AM
Points: 9,928, Visits: 11,199
Jeff Moden (8/10/2009)
In Lowell's example, it also turned a 13 byte string into a 20 byte Varbinary so it's expensive, too. But, I believe that Barry's point was simply that it is unbreakable... not like CHECKSUM at all.

So...SHA1 always returns 160 bits (20 bytes) regardless of the input. HashBytes also appears to be limited to 8000 bytes of input - 8000 ANSI characters or 4000 Unicode. Finally, it doesn't appear unbreakable either: Wikipedia Link.

I mentioned HashBytes way back - now I'm just pointing out some of the reasons that I have yet to use it in a real production system.

Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #768333
Posted Tuesday, August 11, 2009 7:20 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Tuesday, November 25, 2014 11:54 AM
Points: 9,294, Visits: 9,492
Paul White (8/10/2009)
RBarryYoung (8/10/2009)
To clarify: HashBytes w/ SHA1 is nigh-unbreakable.

Hey Barry,

On the other hand, it is relatively slow, only works on strings, and returns varbinary(8000).
If the task is to detect changes in a row of data, would you use HashBytes (SHA1) alone?

Paul

Despite the datatype, I think that it actually only returns 19 bytes. And since VARBINARY is a string and *everything* converts to varbinary very easily, thats not that big a problem (nulls are a bigger problem). So yeah, I *might* use it in situations were I was doing remote comparisons through linked servers and just recording the hash every day for later comparison. The problem with the field by field comparison is that you have to have the entire previous record around to do it.

Though I think that I usually used MD5 in the past.


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #768567
Posted Tuesday, August 11, 2009 7:24 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Tuesday, November 25, 2014 11:54 AM
Points: 9,294, Visits: 9,492
Paul White (8/10/2009)
Jeff Moden (8/10/2009)
In Lowell's example, it also turned a 13 byte string into a 20 byte Varbinary so it's expensive, too. But, I believe that Barry's point was simply that it is unbreakable... not like CHECKSUM at all.

So...SHA1 always returns 160 bits (20 bytes) regardless of the input. HashBytes also appears to be limited to 8000 bytes of input - 8000 ANSI characters or 4000 Unicode. Finally, it doesn't appear unbreakable either: Wikipedia Link.

I mentioned HashBytes way back - now I'm just pointing out some of the reasons that I have yet to use it in a real production system.

Paul

Hmm, I though that I counted 19 bytes... Anyway, qualifying myself again , "nigh-unbreakable" was referring to its use as a signature in detecting random data changes, not as a security device.


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #768573
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse